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This is a non-provisional application based on, 
and claims the benefit of, U.S. Provisional Application 
No. 60/029,322 filed October 25, 1996, the content of 
which is incorporated herein by reference in its 
entirety. 
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FTP.T.D OF THE INVENTION 

The present invention relates to nucleic acids 
and proteins encoded thereby. Invention nucleic acids 
encode a novel N-CAM member of the immunoglobulin 
superfamily of proteins. The invention also relates to 
methods for making and using such nucleic acids and 
proteins . 

BACKGROUND OF THE INVENTION 

Research spanning the last decade has 
significantly elucidated the molecular events attending 
cell-cell interactions in the body, especially those 
events involved in the movement and activation of cells 
in the immune system. See generally. Springer et al . , 
Nature 346:425-434, 1990. Cell surface proteins, and 
especially the so-called Cellular Adhesion Molecules 
("CAMS") have correspondingly been the subject of 
pharmaceutical research and development having as its 
goal intervening in the processes of leukocyte 
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extravasation to sites of inflammation and leukocyte 
movement to distinct target tissues. The isolation and 
characterization of cellular adhesion molecules, the 
cloning and expression of DNA sequences encoding such 
5 molecules, and the development of therapeutic and 

diagnostic agents relevant to inflammatory process, viral 
infection and cancer metastasis have also been the 
subject of numerous U.S. and foreign applications for 
Letters Patent. See Edwards, Current Opinion in 
10 Th^T-^ ppAitic Patents 1 (11) : 1617-1630 , 1991 and 

particularly the published "patent literature references" 
cited therein. 

Numerous CAMs have been characterized to date. 
See, for example, vascular adhesion molecule (VCAM-1) as 

15 described in PCT WO 90/13300; platelet endothelial cell 
adhesion molecule (PECAM-1) described in Newman et al . , 
Science 247:1219-1222, 1990; and PCT WO 91/10683; and the 
following U.S. Patents: 5,525,487; 5,235,049; 5,272,263; 
5,489,233; 5,264,554; 5,318,890; 5,389,520; 5,519,008; 

2 0 and the like. 

There is substantial evidence that N-CAM and 
its relatives play an important part in neural 
development (Edelman and Crossin, "CELL ADHESION 

25 MOLECULES: Implications for a Molecular Histology", Ann. 
Pfiv. Biochem . 60:155-190, 1991; and Walsh and Doherty, 
rnT-r-. Opinion in Cell Biol . 5:791-796, 1993). For 
example, antibodies directed against N-CAMs disturbed the 
normal growth pattern of nerve processes. N-CAM (locus 

30 llq23.1) is expressed in large amounts in cells of the 
developing neural tube, but when neural crest cells 
dissociate from the neural tube and migrate away, they 
lose N-CAM, only to reexpress it later when they 
reaggregate to form a neural ganglion. In addition. 



Rosenthal et al . , (Nature_Genet , 2:107-112, 1992) 
reported that mutations in CAM-Ll (locus Xq2 8) cause 
X-linked hydrocephalus, and Jouet et al . , (Nature Genet. 
7:402-407, 1994) showed that mutations in CAM LI gene are 
responsible for type 1 X-linked spastic paraplegia and 
MASA syndrome which shows agenesis of the corpus 
callosum. Therefore, there is a need in the art to 
identify and isolate novel N-CAM members of the 
immunoglobulin superfamily so that their role in neural 
development and neural cell communication can be 
determined . 

Therefore, there continues to be a need in the 
art for the discovery of additional proteins 
participating in human cell-cell interactions and 
especially a need for information serving to specifically 
identify and characterize such proteins in terms of their 
amino acid sequence. Moreover, to the extent that such 
molecules might form the basis for the development of 
therapeutic and diagnostic agents, it is essential that 
the DNA encoding them be elucidated. The present 
invention satisfies this need and provides related 
advantages as well . 



T^T^TFIF DESCRIPTION OF THE INVENTION 

In accordance with the present invention, there 
are provided isolated nucleic acids encoding novel 
mammalian N-CAM (neural -cell adhesion molecule) members 
of the immunoglobulin superfamily of proteins, referred 
to herein as Down Syndrome-Cell Adhesion Molecules 
(DS-CAMs) . Further provided are vectors containing 
invention nucleic acids, probes that hybridize thereto, 
host cells transformed therewith, antisense 
oligonucleotides thereto and related compositions. The 
nucleic acid molecules described herein can be 



incorporated into a variety of recombinant expression 
systems known to those of skill in the art to readily 
produce isolated DS-CAM proteins. In addition, the 
nucleic acid molecules of the present invention are 
useful as probes for assaying for the presence and/or 
amount of a DS-CAM gene or mRNA transcript in a given 
sample. The nucleic acid molecules described herein, and 
oligonucleotide fragments thereof, are also useful as 
primers and/or templates in a PGR reaction for amplifying 
genes encoding DS-CAM proteins. 

In accordance with the present invention, there 
are also provided isolated mammalian DS-CAM proteins. 
These proteins are useful, for example, in neural 
prosthetic devices used in entubulation methods of 
repairing (regenerating) damaged or severed peripheral 
nerves (see, e.g., U.S. Patent No. 4,955,892, 
incorporated herein by reference) . In addition, these 
proteins, or fragments thereof, are useful as immunogens 
for producing anti-DS-CAM antibodies, or in therapeutic 
compositions containing such proteins and/or antibodies. 
Invention DS-CAM proteins are also useful in bioassays to 
identify agonists and antagonists thereto. Also provided 
are transgenic non-human mammals that express the 
invention protein. 

Antibodies that are immunoreactive with 
invention DS-CAM proteins are also provided. These 
antibodies are useful in diagnostic assays to determine 
levels of DS-CAM proteins present in a given sample, 
e.g., tissue samples. Western blots, and the like. The 
antibodies can also be used to purify DS-CAM proteins 
from crude cell extracts and the like. Moreover, these 
antibodies are considered therapeutically useful to 
counteract or supplement the biological effect of DS-CAMs 
in vivo . 
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Methods and diagnostic systems for determining 
the levels of DS-CAM protein in various tissue samples 
are also provided. These diagnostic methods can be used 
for monitoring the level of therapeutically administered 
5 DS-CAM protein or fragments thereof to facilitate the 

maintenance of therapeutically effective amounts. These 
diagnostic methods can also be used to diagnose 
physiological disorders that result from abnormal levels 
or abnormal structures of the DS-CAM protein. 

10 BRIEF DF..qCRIPTION OF THE FIGURES 

Figure 1 shows a physical map of the 
localization of the DS-CAM gene to a region between 
D21S345 and D21S347 on chromosome 21. The locations of 
BAC clones (starting with numbers) and PAC clones 
15 (starting with "P") are indicated by horizontal bars. An 
arrow head indicates a gap in the BAC and PAC contig. 
The location of the DS-CAM gene is indicated by a thick 
arrow . 

Figure 2 shows the predicted amino acid . 
sequence of the human DS-CAMl protein corresponding to 
SEQ ID NO: 2 and a schematic structure. IG: 
Immunoglobulin type-C2 domain. FbN: Fibronectin type III 
domain. The bold Cs in the amino acid sequence indicates 
Cysteine residues forming disulfide bonds in the Ig-like 
type-C2 domains. The bold NXS and NXT in the amino acid 
sequence correspond to potential N-glycosylation sites. 

Figure 3 shows a partial genomic structure of 
DS-CAMl and a deletion contained in DS-CAM2 cDNA clones 
(clones pDS-CAM-18 and pDS-CAM-52) . The deletion 
3 0 boundary sequence (GC-AG) suggests an unusual 

alternative splicing. The horizontal bar represents 



20 



25 



genomic sequence containing exons of DS-CAM-42. Exons 
are indicated by open boxes. Exon-intron boundaries are 
defined by a comparison of the cDNA sequence of 
pDS-CAM-42 and genomic sequence determined from a BAG 
clone . 

Figure 4 shows a schematic comparison of 
neuronal Ig superfamily members. Ig-like type C-2 
domains, fibronectin type III domains and transmembrane 
domains are indicated, MAG: myelin-associated 
glycoprotein, N-CAM: neural cell adhesion molecule, 
BIG-1: brain-derived immunoglobulin (Ig) superfamily 
molecule-1, DCC : deleted in colorectal carcinoma. 

DTiriTATT.KD DESCRTPTTON O F THE INVENTION 

In accordance with the present invention, there 
are provided isolated nucleic acids, which encode novel 
mammalian members of the DS-CAM family of proteins, and 
fragments thereof. The phrase "DS-CAM" refers to 
substantially pure native DS-CAM protein, or 
recombinantly produced proteins, including naturally 
occurring allelic variants thereof encoded by mRNA 
generated by alternative splicing of a primary 
transcript, such as DS-CAMl (SEQ ID NO: 2) and DS-CAM2 
(SEQ ID NO: 11) disclosed herein, and further including 
fragments thereof which retain at least one native 
biological activity, such as immunogenicity . In one 
aspect, invention DS-CAM proteins, such as DS-CAMl, are 
cell-surface glycoproteins that are mobile in the plane 
of the membrane. Invention DS-CAMl proteins contain 
extra- and intra-cellular domains that transduce 
information from the outside of the cell to the cytoplasm 
and the nucleus, thereby determining cell function. In 
another aspect, invention DS-CAM proteins, such as DS- 
CAM2, are non-membrane bound, soluble proteins. 



In one aspect of the invention DS-CAM proteins 
are further characterized as comprising at least 7 
Immunoglobulin- like (Ig-like) domains homologous to the 
immunoglobulin super family and 6 type III fibronectin 
repeats (see, e.g., Edelman and Crossin, "CELL ADHESION 
MOLECULES: Implications for a Molecular Histology", Ann. 
T^^.v. Biochem . , 60:155-190, 1991; and Walsh and Doherty, 
ru-rr. OionniQn in Cell Biol ., 5:791-796, 1993; each of 
which is incorporated herein by reference in its 
entirety) . In another aspect of the invention, DS-CAM 
proteins are those proteins comprising at least 8, 
preferably at least 9 Ig-like domains, with at least 10 
Ig-like domains being especially preferred. 

As used herein, "Ig-like domains", or 
grammatical variations thereof, refers to the well known 
repeats that are common among Cell Adhesion Molecules 
(CAMS) (see, e.g.. Figure lA at p. 158 of Edelman and 
Crossin, supra , 1991; and Walsh and Doherty, supra , 1993; 
each of which is incorporated herein by reference in its 
entirety) . 

The phrase "type III fibronectin repeats", 
"fibronectin repeats," or grammatical variations thereof, 
refers to the well known repeats that are common among 
Cell Adhesion Molecules (CAMs) (see, e.g., Figure lA at 
p, 158 of Edelman and Crossin, supra , 1991; and Walsh and 
Doherty, supra , 1993; each of which is incorporated 
herein by reference in its entirety) , 

The invention DS-CAM proteins define a novel 
sub-class of the Ig (immunoglobulin) superfamily with 
highest homologies to the neural cell adhesion molecules 
including BIG-1 (Yoshihara et al . , Neuron 13:415-426, 
1994), CAM-Ll (Moos et al . , Nature 334:701-703, 1988), 
DCC (Fearon et al . , Science 247:49-56, 1990), neogenin 
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(Lane et al , , r.^nomics 35:456-465, 1996), and contactin 
(Ranscht, .T. CrII Bio . 107:1561-1573, 1988) (Figure 4). 
It has been found that the structure of invention DS-CAM 
proteins is unique within the neural immunoglobulin 
superfamily, and is distinctive due to the number of 
Ig-like type C2 and fibronectin III domains (10 and 6 
respectively) and from the interruption of the fourth and 
fifth fibronectin domains by a 10th C2 domain, the 
functional significance of which may be of interest. The 
novel structure of DS-CAM and its expression throughout 
the nervous system during differentiation suggest 
interesting roles for the neural CAM in neural 
development and function. The location of DS-CAM in a 
region critical for DS neurocognitive phenotypes provides 
a human model in which to test the significance of these 
roles for cognitive function . 

The neural Ig-superf amily members play critical 
roles in neural development and function and have been 
implicated in cell migration and sorting, axon guidance 
and f asciculation, formation of neural connections, and 
in synaptic plasticity (Edelman and Crossin, supra , 1991; 
Walsh and Doherty, supra , 1993; Tessier-Lavigne et al . , 
.qcience 274:1123-1133. 1 996: Shuster et al . . Neuron 
17:641-6-^4. 1996: Shustf^.T" et al . , Neuron 17:655-657, 
1996) . These activities are mediated by the hemophilic 
or heterophilic binding properties of Ig-superf amily 
members (Mauro et al . , .t. Cell Bio. 119:191-202, 1992 and 
Milev et al., .T. Riol. Chem . 271:15716-15723, 1996), the 
binding of Ig-superf amily proteins to extracellular 
matrix proteins (Grumet et al . , Cell Adhesion Comm. 
1:177-190, 1993; Taira et al . , Neuron 12 :861-872, 1994; 
and Zisch et al . , .T. Cell Bio . 119:203-213, 1992), and 
the binding to smaller diffusible chemorepellents or 
chemoattractants, for example, DCC and netrin (Keino-Masu 
et al.. Cell 87:175-185, 1996). 
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The specificity of DS-CAM expression for the 
central nervous system and the timing of its expression 
to the period of neurite outgrowth in both the central 
and peripheral nervous systems, indicates a role for 
DS-CAM in early development and differentiation (Examples 
4 and 5) . Early in development when, with the exception 
of neural crest precursors, expression is clearly absent 
from regions that contain dividing neuroepithelial 
precursors such as the ependymal layer of the neural tube 
and the ventricular zone of the brain (Altman and Bayer, 
AMas of Prenatal Rat Br^in Dev elopment, CRC Press, Ann 
Arbor, Ml, 1995) . In the embryo, differentiated neurons 
express DS-CAM when they have finished migrating to their 
proper positions within the neuroepithelium, during 
neurite outgrowth . 

Neural crest cells may express DS-CAM while 
they are migrating. At 15.5 and 16.5 days pc, most of 
the neural crest derived tissues have some expression, 
although not all have finished migration. The continued 
expression of DS-CAM in the myenteric plexus after 15.5- 
16.5 dpc is due to the neural crest cells that have 
stopped dividing, although others are in the cell cycle. 
Approximately 50% of myenteric ganglia neurons arise 
after birth and DS-CAM may be expressed later in this 
subset. At later stages, the data suggest that DS-CAM is 
down regulated in the neural crest derivatives such as 
the myenteric ganglia and ganglia of the pancreas. The 
DS-CAM expression in tissues derived from the neural 
crest is of interest with respect to the high level 
detected in the umbilical cord. The tissue surrounding 
the umbilical artery and vein is derived from the neural 
crest and functions in coordinating the cardiovascular 
changes occurring at birth. The expression detected in 
the fetal liver and branchial arches is also derived from 
neural crest related to the ductus venosus and ultimately 
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the ductus arteriosus and cardiac outflow tracts, 
respectively . 

DS-CAM expression continues post-natally , in 
the differentiating regions of the newborn brain, such 
as, the septum and inferior colliculus, and in the adult 
in regions associated with plasticity, such as, the 
olfactory bulb and hippocampus. When combined with the 
evidence for involvement of the Ig superfamily in 
determining synaptic strength (Mayford et al . , Science 
256:638-644, 1992)), the continued expression supports a 
role for DS-CAM in remodeling, learning and memory. The 
expression pattern and the role of dendritic connections 
in cell body maintenance indicate that an increase in 
DS-CAM expression in DS brain is responsible in part for 
the abnormalities of dendritic structure and decreased 
intersections seen at four months post-natal in DS 
individuals . 

Alternatively spliced variants of CAMs have 
distinct roles in different parts of the brain, as 
demonstrated for closely related Ig-superf amily members, 
such as, NCAM (Cunningham et al . , Science 236:799-806, 
1987 and Figarella-Branger et al . , J . Neuropathol . Exp . 
Neurol . 51:12-23, 1992), The differential expression of 
alternatively spliced DS-CAM transcripts encoding DS-CAMl 
(SEQ ID NO: 2) and DS-CAM2 (SEQ ID NO: 11) has likewise 
been observed in various parts of the human adult brain. 
For example, it has been found that DS-CAM clones 
encoding DS-CAM2 contain a small deletion relative to 
DS-CAMl, which deletion contains the transmembrane domain 
(Example 3 and Figure 3) and results in a stop codon 36 
bp downstream. The results of RT-PCR (Example 5) 
indicated that all RNAs tested from various human tissues 
expressed both the DS-CAMl and DS-CAM2 transcripts and 
that the PCR products generated the sequence and size 
predicted for the appropriate form. The proximal and 



distal borders of the deletion are located within 
neighboring exons and reveal variant consensus splice 
site sequences (Jackson, Nun. Acid Res. 19:3795-3798, 
1991) with further surrounding homology to the Ul 
spliceosome RNA. 

From Northern analyses (Example 4) a minimum of 
three distinct transcripts are recognized by a probe for 
the transmembrane domain. From cDNA sequence analyses 
(Example 5) two forms of the DS-CAM protein are deduced, 
one that generates a transmembrane adhesion molecule and 
a second that is deleted for the transmembrane domain, 
thereby generating a molecule that is transported to the 
extracellular matrix. This mode of generating 
extracellular and membrane bound forms of CAMs is in 
surprising contrast to the GPI 

(glycosylphosphatidylinositol) linkage used by most CAMs, 
and would provide a way of generating longer range 
hemophilic interactions between cells and the 
extracellular matrix, which may be significant for cell 
migration. 

The DS-CAM gene was isolated (as described in 
the Examples hereinafter) by using the BAC contig on 
21q22 . 2-q22 . 3 covering the region between D21S55 and MXl 
(Hubert et al . , C^f^nomics 41:218-226, 1997). The gene 
spans a minimum of 900 kb, estimated by summing the size 
of BACs and PACs that are non-overlapping and covered by. 
the DS-CAM gene (Figure 1) . The DS-CAM gene covers a gap 
in all physical maps of this region. From hybridization 
experiments indicating no signal of the complete cDNA to 
BAC 277G10 covering 210 kb, a 5' intron is at least this 
size, similar to the first intron of the DCC gene 
(Cho et al., Genomics 19:525-531, 1994). Alternatively, 
other alternative transcripts can contain exons located 
in this BAC. The gene spans the boundary of bands 
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21q22.2 and q22.3, a Gierasa-dark and Giemsa-light band, 
^respectively . The location of the gene for PEP19, a 
small 634 bp gene with large introns within the same band 
21q22.2 (Cabin et al . , Somat . Cell Mol , Genet . 22:167- 
5 175, 1996) suggests a general structure of genes in G- 
bands having large introns . 

The nucleic acid molecules described herein are 
useful for producing invention DS-CAM proteins, when such 
nucleic acids are incorporated into a variety of protein 

10 expression systems known to those of skill in the art. 
In addition, such nucleic acid molecules or fragments 
thereof can be labeled with a readily detectable 
substituent and used as hybridization probes for assaying 
for the presence and/or amount of a DS-CAM gene or mRNA 

15 transcript in a given sample. The nucleic acid molecules 
described herein, and fragments thereof, are also useful 
as primers and/or templates in a PCR reaction for 
amplifying genes encoding the invention protein described 
herein. 

20 The term "nucleic acid" (also referred to as 

polynucleotides) encompasses ribonucleic acid (RNA) or 
deoxyribonucleic acid (DNA) , probes, oligonucleotides, 
and primers. DNA can be either complementary DNA (cDNA) 
or genomic DNA, e.g. a gene encoding a DS-CAM protein. 

25 One means of isolating a nucleic acid encoding a DS-CAM 
polypeptide is to probe a mammalian genomic library with 
a natural or artificially designed DNA probe using 
methods well known in the art. DNA probes derived from 
the DS-CAM gene are particularly useful for this purpose. 

3 0 DNA and cDNA molecules that encode DS-CAM polypeptides 

can be used to obtain complementary genomic DNA, cDNA or 
RNA from mammalian (e.g., human, mouse, rat, rabbit, pig, 
and the like) , or other animal sources, or to isolate 
related cDNA or genomic clones by the screening of cDNA 

35 or genomic libraries, by methods described in more detail 
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below. Examples of nucleic acids are RNA, cDNA, or 
isolated genomic DNA encoding a DS-CAM polypeptide. Such 
nucleic acids may include, but are not limited to, 
nucleic acids having substantially the same nucleotide 
5 sequence as set forth in SEQ ID NO:l, SEQ ID NO: 7, 
SEQ ID NO: 8, SEQ ID NO : 9 , SEQ ID NO: 10, or at least 
nucleotides 453-6185 set forth in SEQ ID N0:1, or 
nucleotides 453-5168 set forth in SEQ ID NO: 10. 

Use of the terms "isolated" and/or "purified" 
10 in the present specification and claims as a modifier of 
DNA, RNA, polypeptides or proteins means that the DNA, 
RNA, polypeptides or proteins so designated have been 
produced in such form by the hand of man, and thus are 
separated from their native in vivo cellular environment. 
15 As a result of this human intervention, the recombinant 
DNAs, RNAs, polypeptides and proteins of the invention 
are useful in ways described herein that the DNAs, RNAs, 
polypeptides or proteins as they naturally occur are not. 

As used herein, "mammalian" refers to the 
2 0 variety of species from which the invention DS-CAM 
protein is derived, e.g., human, rat, mouse, rabbit, 
monkey, baboon, bovine, porcine, ovine, canine, feline, 
and the like, A preferred DS-CAM protein herein, is 
human DS-CAM. 

2 5 In one embodiment of the present invention, 

cDNAs encoding the invention DS-CAM proteins disclosed 
herein include substantially the same nucleotide sequence 
as set forth in SEQ ID NO:l, SEQ ID NO: 7, SEQ ID NO : 8 , 
SEQ ID NO: 9, or SEQ ID NO: 10. Preferred cDNA molecules 

3 0 encoding the invention proteins include the same 

nucleotide sequence as nucleotides 453-6185 set forth in 
SEQ ID N0:1, or nucleotides 453-5168 set forth in 
SEQ ID NO: 10. 
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As employed herein, the term "substantially the 
same nucleotide sequence" refers to DNA having sufficient 
identity to the reference polynucleotide, such that it 
will hybridize to the reference nucleotide under 
5 moderately stringent hybridization conditions. In one 
embodiment, DNA having substantially the same nucleotide 
sequence as the reference nucleotide sequence encodes 
substantially the same amino acid sequence as that set 
forth in SEQ ID NO : 2 or SEQ ID NO: 11, or the DS-CAM 

10 coding region of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO : 9 , 
or a larger amino acid sequence including SEQ ID NO : 2 or 
SEQ ID NO: 11, or the DS-CAM coding region of SEQ ID NO : 7 , 
SEQ ID NO : 8 or SEQ ID NO : 9 . In another embodiment , DNA 
having "substantially the same nucleotide sequence" as 

15 the reference nucleotide sequence has at least 6 0% 
identity with respect to the reference nucleotide 
sequence. DNA having at least 70%, more preferably at 
least 90%, yet more preferably at least 95%, identity to 
the reference nucleotide sequence is preferred, 

2 0 This invention also encompasses nucleic acids 

which differ from the nucleic acids shown in SEQ ID N0:1, 
SEQ ID NO: 7, SEQ ID NO : 8 , SEQ ID NO : 9 , SEQ ID NO: 10 but 
which have the same phenotype . Phenotypically similar 
nucleic acids are also referred to as "functionally 

25 equivalent nucleic acids" . As used herein, the phrase 
"functionally equivalent nucleic acids" encompasses 
nucleic acids characterized by slight and non- 
consequential sequence variations that will function in 
substantially the same manner to produce the same protein 

30 product (s) as the nucleic acids disclosed herein. In 

particular, functionally equivalent nucleic acids encode 
polypeptides that are the same as those disclosed herein 
or that have conservative amino acid variations, or that 
encode larger polypeptides that includes SEQ ID NO : 2 or 

35 SEQ ID NO: 11, or the DS-CAM coding region of SEQ ID NO : 7 , 
SEQ ID NO: 8 or SEQ ID NO : 9 . For example, conservative 
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variations include substitution of a non-polar residue 
with another non-polar residue, or substitution of a 
charged residue with a similarly charged residue. These 
variations include those recognized by skilled artisans 
5 as those that do not substantially alter the tertiary 
structure of the protein. 

Further provided are nucleic acids encoding 
DS-CAM polypeptides that, by virtue of the degeneracy of 
the genetic code, do not necessarily hybridize to the 
invention nucleic acids under specified hybridization 
conditions. Preferred nucleic acids encoding the 
invention polypeptides are comprised of nucleotides that 
encode substantially the same amino acid sequences set 
forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM 
coding region of SEQ ID NO: 7, SEQ ID NO: 8 or SEQ ID NO : 9 . 

Thus , an exemplary nucleic acid encoding an 
invention DS-CAM protein may be selected from: 

(a) DNA encoding the amino acid sequence 
set forth in SEQ ID NO: 2 or SEQ ID NO: 11, or 
the DS-CAM coding region of SEQ ID NO: 7, 
SEQ ID NO: 8 or SEQ ID NO : 9 , 

(b) DNA that hybridizes to the DNA of (a) 
under moderately stringent conditions, wherein 
said DNA encodes biologically active DS-CAM, or 

(c) DNA degenerate with respect to either 
(a) or (b) above, wherein said DNA encodes 
biologically active DS-CAM. 

Hybridization refers to the binding of 
complementary strands of nucleic acid (i,e., 
3 0 sense : antisense strands or probe : target -DNA) to each 

other through hydrogen bonds, similar to the bonds that 
naturally occur in chromosomal DNA. Stringency levels 
used to hybridize a given probe with target -DNA can be 
readily varied by those of skill in the art. 
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The phrase "stringent hybridization" is used 
herein to refer to conditions under which polynucleic 
acid hybrids are stable. As known to those of skill in 
the art, the stability of hybrids is reflected in the 
5 melting temperature (T^) of the hybrids. In general, the 
stability of a hybrid is a function of sodium ion 
concentration and temperature. Typically, the 
hybridization reaction is performed under conditions of 
lower stringency, followed by washes of varying, but 
10 higher, stringency. Reference to hybridization 
stringency relates to such washing conditions. 

As used herein, the phrase "moderately 
stringent hybridization" refers to conditions that permit 
target -DNA to bind a complementary nucleic acid that has 

15 about 60% identity, preferably about 75% identity, more 
preferably about 85% identity to the target DNA; with 
greater than about 90% identity to target-DNA being 
especially preferred. Preferably, moderately stringent 
conditions are conditions equivalent to hybridization in 

20 50% formamide, 5X Denhardt » s solution, 5X SSPE, 0.2% SDS 
at 42°C, followed by washing in 0 . 2X SSPE, 0.2% SDS, at 
65°C. 

The phrase "high stringency hybridization" 
refers to conditions that permit hybridization of only 

25 those nucleic acid sequences that form stable hybrids in 
0.018M NaCl at 65 °C (i.e., if a hybrid is not stable in 
0.018M NaCl at 65''C, it will not be stable under high 
stringency conditions, as contemplated herein) . High 
stringency conditions can be provided, for example, by 

30 hybridization in 50% formamide, 5X Denhardt » s solution, 
5X SSPE, 0.2% SDS at 42 ""C, followed by washing in 0 . IX 
SSPE, and 0.1% SDS at 65°C. 

The phrase "low stringency hybridization" 
refers to conditions equivalent to hybridization in 10% 
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formamide, 5X Denhardt ' s solution, 6X SSPE, 0.2% SDS at 
42°C, followed by washing in IX SSPE, 0.2% SDS, at 50°C, 
Denhardt 's solution and SSPE (see, e.g., Sambrook et al . , 
Molecular Cloning, A Laboratory Manual , Cold Spring 
5 Harbor Laboratory Press, 1989) are well known to those of 
skill in the art as are other suitable hybridization 
buffers . 



As used herein, the term "degenerate" refers to 
10 codons that differ in at least one nucleotide from a 

reference nucleic acid, e.g., SEQ ID NO:l, but encode the 
same amino acids as the reference nucleic acid. For 
example, codons specified by the triplets "UCU", "UCC" , 
"UCA", and "UCG" are degenerate with respect to each 
15 other since all four of these codons encode the amino 
acid serine. 



Preferred nucleic acids encoding the invention 
polypeptide (s) hybridize under moderately stringent, 
preferably high stringency, conditions to substantially 
2 0 the entire sequence, or in certain embodiments 

substantial portions (i.e., typically at least 15-30 
nucleotides) of the nucleic acid sequence set forth in 
SEQ ID NO:l, SEQ ID NO : 7 , SEQ ID NO : 8 , SEQ ID NO : 9 or 
SEQ ID NO: 10. 



25 The invention nucleic acids can be produced by 

a variety of methods well-known in the art, e.g., the 
methods described herein, employing PCR amplification 
using oligonucleotide primers from various regions of 
SEQ ID NO:l, SEQ ID NO : 7 , SEQ ID NO : 8 , SEQ ID NO : 9 , 

3 0 SEQ ID NO: 10, and the like. 

In accordance with a further embodiment of the 
present invention, optionally labeled DS- CAM- encoding 
cDNAs, or fragments thereof, can be employed to probe 
library (ies) (e.g., cDNA, genomic, and the like) for 
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additional nucleic acid sequences encoding novel 
mammalian DS-CAM proteins. As described in Example 3, 
construction of mammalian cDNA libraries, preferably a 
human trisomy 21 fetal brain cDNA library, is well-known 
5 in the art. Screening of such a cDNA library is 

initially carried out under low- stringency conditions, 
which comprise a temperature of less than about 42 °C, a 
formamide concentration of less than about 50%, and a 
moderate to low salt concentration. 

Presently preferred probe-based screening 
conditions comprise a temperature of about 3 7°C, a 
formamide concentration of about 20%, and a salt 
concentration of about 5X standard saline citrate (SSC; 
2 OX SSC contains 3M sodium chloride, 0 . 3M sodium citrate, 
pH 7.0). Such conditions will allow the identification 
of sequences which have a substantial degree of 
similarity with the probe sequence, without requiring 
perfect homology. The phrase "substantial similarity" 
refers to sequences which share at least 50% homology. 
Preferably, hybridization conditions will be selected 
which allow the identification of sequences having at 
least 70% homology with the probe, while discriminating 
against sequences which have a lower degree of homology 
with the probe. As a result, nucleic acids having 
substantially the same nucleotide sequence as nucleotides 
453-6185 set forth in SEQ ID N0:1, or nucleotides 
453-5168 set forth in SEQ ID NO:10, SEQ ID NO : 7 , 
SEQ ID NO: 8, or SEQ ID NO : 9 are obtained. 

As used herein, a nucleic acid "probe" is 
3 0 single -stranded DNA or RNA, or analogs thereof, that has 
a sequence of nucleotides that includes at least 14, at 
least 20, at least 50, at least 100, at least 200, at 
least 300, at least 400, or at least 500 contiguous bases 
that are the same as (or the complement of) any 
3 5 contiguous bases set forth in any of SEQ ID N0:1, 
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SEQ ID NO: 7, SEQ ID NO : 8 , SEQ ID NO : 9 or SEQ ID NO: 10. 
Preferred regions from which to construct probes include 
5' and/or 3' coding regions of SEQ ID NO:l, SEQ ID NO: 7, 
SEQ ID NO: 8, SEQ ID NO : 9 or SEQ ID NO: 10. In addition, 
5 the entire cDNA encoding region of an invention DS-CAM 
protein, or the entire sequence corresponding to SEQ ID 
NO:l, SEQ ID NO : 7 , SEQ ID NO : 8 , SEQ ID NO : 9 or SEQ ID 
NO: 10, may be used as a probe. Probes may be labeled by 
methods well-known in the art, as described hereinafter, 
10 and used in various diagnostic kits. 



As used herein, the terms "label" and 
"indicating means" in their various grammatical forms 
refer to single atoms and molecules that are either 
directly or indirectly involved in the production of a 

15 detectable signal. Any label or indicating means can be 
linked to invention nucleic acid probes, expressed 
proteins, polypeptide fragments, or antibody molecules. 
These atoms or molecules can be used alone or in 
conjunction with additional reagents. Such labels are 

2 0 themselves well-known in clinical diagnostic chemistry. 



The labeling means can be a fluorescent 
labeling agent that chemically binds to antibodies or 
antigens without denaturation to form a fluorochrome 
(dye) that is a useful immunof luorescent tracer . A 

2 5 description of immunof luorescent analytic techniques is 

found in DeLuca, "Immunofluorescence Analysis", in 
Antibody As a Tool , Marchalonis et al . , eds . , John Wiley 
& Sons, Ltd., pp. 189-231, 1982, which is incorporated 
herein by reference. 

3 0 In one embodiment, the indicating group is an 

enzyme, such as horseradish peroxidase (HRP) , glucose 
oxidase, and the like. In another embodiment, 
radioactive elements are employed labeling agents. The 
linking of a label to a substrate, i.e., labeling of 
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nucleic acid probes, antibodies, polypeptides, and 
proteins, is well known in the art. For instance, an 
invention antibody can be labeled by metabolic 
incorporation of radiolabeled amino acids provided in the 
5 culture medium. See, for example, Galfre et al . , Meth. 
Enzymol , 73:3-46, 1981. Conventional means of protein 
conjugation or coupling by activated functional groups 
are particularly applicable. See, for example, Aurameas 
et al., Scand. J. Immunol . 8 (7): 7-23, 1978; Rodwell et 

10 al., Biotech . 3:889-894, 1984; and U . S . Patent 
No. 4,493,795. 

In accordance with another embodiment of the 
present invention, there are provided isolated mammalian 
DS-CAM proteins (preferably human) , polypeptides, and 

15 fragments thereof encoded by invention nucleic acid. 

Preferably, DS-CAM proteins referred to herein, are those 
polypeptides specifically recognized by an antibody that 
also specifically recognizes a DS-CAM protein including 
the sequence set forth in SEQ ID NO : 2 or SEQ ID NO: 11, or 

2 0 the DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO : 8 or 

SEQ ID NO: 9. Invention isolated DS-CAM proteins are free 
of cellular components and/or contaminants normally 
associated with a native in vivo environment. 

The invention DS-CAM proteins are further 

2 5 characterized as being primarily expressed in fetal brain 

and not expressed in fetal lung or fetal liver. For 
example, the results of Northern analysis (described in 
Example 4) using human fetal tissues showed that 8.5 kb 
and 7.6 kb transcripts are expressed only in fetal brain 

3 0 and not expressed in fetal lung, fetal liver and fetal 

kidney. Northern blot analyses of adult tissues 
revealed differential expression of three alternative 
transcripts of 9.7 kb, 8.5 kb and 7.6 kb in different 
substructures of the brain. The 9.7 kb transcript is 
3 5 highly expressed in the substantia nigra, moderately 
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expressed in the amygdala and hippocampus, and less 
expressed in the whole brain. A similar pattern is 
observed by using a PGR product spanning the 191 bp 
deletion found in DS-CAM-18 and DS-CAM-52. The placenta 
5 shows faint bands, and the sizes are smaller than those 
in brain. In skeletal muscle, a faint band (G.5 kb) is 
detected . 

The results of RT-PCR (Example 5) demonstrated 
expression of human DS-CAM mRNA in fetal and adult brain, 
10 in fetal kidney, as well as in a breast carcinoma cell 
line mRNA. Thus, splice variant cDNA transcripts 
encoding a DS-CAM family of proteins are clearly 
contemplated by the present invention. 

The region of chromosome locus 21q22 , 2 from 

15 which DS-CAM is derived is part of the candidate region 
for holoprosencephaly type I (HPEl) . In addition, some 
patients with this region hemizygously deleted show 
abnormalities of the corpus callosum and schizencephaly . 
Therefore, DS-CAM is contemplated as the gene, which when 

2 0 defective, deleted or present as a duplication, is 

responsible for holoprosencephaly, agenesis of the corpus 
callosum and/or structural defects of the brain. In 
addition, DS-CAM may also be responsible for several 
phenotypes of Down Syndrome including mental retardation 

25 as well as, more specifically, the abnormal dendritic 
structure observed in Down Syndrome. Additional roles 
for DS-CAM were further evaluated by database homology 
searches using BLAST X/N and TIGR database analyses. 
Results of these searches indicate that DS-CAM shows 

30 moderate homology to N-CAM-1 (Cunningham et al . , Science, 
236:799-806, 1987) and to DCC (Fearon et al . , Science, 

247:49-56, 1990) . 

Presently preferred DS-CAM proteins of the 
invention include amino acid sequences that are 
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substantially the same as the protein sequence set forth 
in SEQ ID NO : 2 or SEQ ID NO: 11, or the DS-CAM coding 
region of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO: 9, as 
well as biologically active, modified forms thereof. 
5 Those of skill in the art will recognize that numerous 
residues of the above-described sequences can be 
substituted with other, chemically, sterically and/or 
electronically similar residues without substantially 
altering the biological activity of the resulting 
10 receptor species. In addition, larger or smaller 

polypeptide sequences containing substantially the same 
sequence as SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM 
coding region of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO : 9 , 
therein (e.g., splice variants) are contemplated. 

15 

As employed herein, the term "substantially the 
same amino acid sequence" refers to amino acid sequences 
having at least about 50%, preferably at least about 60%, 
more preferably at least about 70% identity with respect 
2 0 to the reference amino acid sequence, and retaining 
comparable functional and biological activity 
characteristic of the protein defined by the reference 
amino acid sequence. In another embodiment of the 
invention, preferred invention proteins having 

2 5 "substantially the same amino acid sequence" will have at 

least about 80%, more preferably 90% amino acid identity 
with respect to the reference amino acid sequence; with 
greater than about 95% amino acid sequence identity being 
especially preferred. It is recognized, however, that 

3 0 polypeptides (or nucleic acids referred to hereinbefore) 

containing less than the described levels of sequence 
identity arising as splice variants or that are modified 
by conservative amino acid substitutions, or by 
substitution of degenerate codons are also encompassed 
3 5 within the scope of the present invention. 
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The term "biologically active" or "functional", 
when used herein as a modifier of invention DS-CAM 
protein (s), or polypeptide fragment thereof, refers to a 
polypeptide that exhibits functional characteristics 
5 similar to DS-CAM. For example, one biological activity 
of DS-CAM is the ability to act as an immunogen for the 
production of polyclonal and monoclonal antibodies that 
bind specifically to DS-CAM. Thus, an invention nucleic 
acid encoding DS-CAM will encode a polypeptide 

10 specifically recognized by an antibody that also 

specifically recognizes the DS-CAM protein including the 
sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the 
DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO : 8 or 
SEQ ID NO: 9. Such activity may be assayed by any method 

15 known to those of skill in the art. For example, a 

test-polypeptide encoded by a DS-CAM cDNA can be used to 
produce antibodies, which are then assayed for their 
ability to bind to the protein including the sequence set 
forth in SEQ ID NO : 2 or SEQ ID NO: 11, or the DS-CAM 

2 0 coding region of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO: 9. 
If the antibody binds to the test-polypeptide and the 
protein including the sequence set forth in SEQ ID NO: 2 
or SEQ ID NO: 11, or the DS-CAM coding region of 
SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO : 9 with 

2 5 substantially the same affinity, then the polypeptide 
possesses the requisite biological activity . 

The invention DS-CAM proteins can be isolated 
by a variety of methods well-known in the art, e.g., the 
methods described herein, the recombinant expression 
30 systems described herein, precipitation, gel filtration, 
ion-exchange, reverse-phase and affinity chromatography, 
and the like. Other well-known methods are described in 
Deutscher et al . , Guide to Protein Purification ; Methods 
in Enzymology 182 (Academic Press, 1990) , which is 

35 incorporated herein by reference. Alternatively, the 
isolated polypeptides of the present invention can be 
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obtained using well-known recombinant methods as 
described, for example, in Sambrook et al . , supra./ 

1989) . 

An example of the means for preparing the 
5 invention polypeptide (s) is to express nucleic acids 
encoding the DS-CAM in a suitable host cell, such as a 
bacterial cell, a yeast cell, an amphibian cell (i.e., 
oocyte) , or a mammalian cell, using methods well known in 
the art, and recovering the expressed polypeptide, again 

10 using well-known methods. Invention polypeptides can be 
isolated directly from cells that have been transformed 
with expression vectors as described below herein. The 
invention polypeptide, biologically active fragments, and 
functional equivalents thereof can also be produced by 

15 chemical synthesis. For example, synthetic polypeptides 
can be produced using Applied Biosystems, Inc. Model 430A 
or 431A automatic peptide synthesizer (Foster City, CA) 
employing the chemistry provided by the manufacturer. 

The present invention also provides 
20 compositions containing an acceptable carrier and any of 
an isolated, purified DS-CAM polypeptide, an active 
fragment thereof, or a purified, mature protein and 
active fragments thereof, alone or in combination with 
each other. These polypeptides or proteins can be 
25 recombinantly derived, chemically synthesized or purified 
from native sources. As used herein, the term 
"acceptable carrier" encompasses any of the standard 
pharmaceutical carriers, such as phosphate buffered 
saline solution, water and emulsions such as an oil/water 
3 0 or water/oil emulsion, and various types of wetting 
agents . 

Also provided are ant i sense oligonucleotides 
having a sequence capable of binding specifically with 
any portion of an mRNA that encodes DS-CAM polypeptides 
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SO as to prevent translation of the mRNA. The antisense 
oligonucleotide may have a sequence capable of binding 
specifically with any portion of the sequence of the cDNA 
encoding DS-CAM polypeptides. As used herein, the phrase 
5 "binding specifically" encompasses the ability of a 
nucleic acid sequence to recognize a complementary 
nucleic acid sequence and to form double-helical segments 
therewith via the formation of hydrogen bonds between the 
complementary base pairs. An example of an antisense 
10 oligonucleotide is an antisense oligonucleotide 
comprising chemical analogs of nucleotides. 

Compositions comprising an amount of the 
antisense oligonucleotide, described above, effective to 
reduce expression of DS-CAM polypeptides by passing 
through a cell membrane and binding specifically with 
mRNA encoding DS-CAM polypeptides so as to prevent 
translation and an acceptable hydrophobic carrier capable 
of passing through a cell membrane are also provided 
herein. Suitable hydrophobic carriers are described, for 
example, in U.S. Patent Nos . 5,334,761; 4,889,953; 
4,897,355, and the like. The acceptable hydrophobic 
carrier capable of passing through cell membranes may 
also comprise a structure which binds to a receptor 
specific for a selected cell type and is thereby taken up 
by cells of the selected cell type. The structure may be 
part of a protein known to bind to a cell-type specific 
receptor , 

Antisense oligonucleotide compositions are 
useful to inhibit translation of mRNA encoding invention 
3 0 polypeptides. Synthetic oligonucleotides, or other 

antisense chemical structures are designed to bind to 
mRNA encoding DS-CAM polypeptides and inhibit translation 
of mRNA and are useful as compositions to inhibit 
expression of DS-CAM associated genes in a tissue sample 
3 5 or in a subject. 



20 
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In accordance with another embodiment of the 
invention, kits for detecting mutations, duplications, 
deletions, rearrangements and aneuploidies in chromosome 
21 at locus q22.2 comprising at least one invention probe 
5 or antisense nucleotide. 

The present invention provides means to 
modulate levels of expression of DS-CAM polypeptides by 
employing synthetic antisense oligonucleotide 
compositions (hereinafter SAOC) which inhibit translation 

10 of mRNA encoding these polypeptides. Synthetic 

oligonucleotides, or other antisense chemical structures 
designed to recognize and selectively bind to mRNA, are 
constructed to be complementary to portions of the DS-CAM 
coding strand or nucleotide sequences shown in 

15 SEQ ID N0:1, SEQ ID NO : 7 , SEQ ID NO : 8 , SEQ ID NO : 9 or 

SEQ ID NO: 10. The SAOC is designed to be stable in the 
blood stream for administration to a subject by 
injection, or in laboratory cell culture conditions. The 
SAOC is designed to be capable of passing through the 

2 0 cell membrane in order to enter the cytoplasm of the cell 

by virtue of physical and chemical properties of the SAOC 
which render it capable of passing through cell 
membranes, for example, by designing small, hydrophobic 
SAOC chemical structures, or by virtue of specific 
25 transport systems in the cell which recognize and 

transport the SAOC into the cell. In addition, the SAOC 
can be designed for administration only to certain 
selected cell populations by targeting the SAOC to be 
recognized by specific cellular uptake mechanisms which 

3 0 bind and take up the SAOC only within select cell 

populations . 

For example, the SAOC may be designed to bind 
to a receptor found only in a certain cell type, as 
discussed supra . The SAOC is also designed to recognize 
3 5 and selectively bind to target mRNA sequence, which may 
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correspond to a sequence contained within the sequence 
shown in SEQ ID NO:l, SEQ ID NO : 7 , SEQ ID NO : 8 , 

SEQ ID NO: 9 or SEQ ID NO: 10. The SAOC is designed to 
inactivate target mRNA sequence by either binding thereto 
5 and inducing degradation of the mRNA by, for example, 
RNase I digestion, or inhibiting translation of mRNA 
target sequence by interfering with the binding of 
translation-regulating factors or ribosomes, or inclusion 
of other chemical structures, such as ribozyme sequences 

10 or reactive chemical groups which either degrade or 

chemically modify the target mRNA. SAOCs have been shown 
to be capable of such properties when directed against 
mRNA targets (see Cohen et al . , TIBS 10:435, 1989 and 
Weintraub, Sci . American January 1990, pp.40; both 

15 incorporated herein by reference) . 

In accordance with yet another embodiment of 
the present invention, there is provided a method for the 
recombinant production of invention DS-CAM protein (s) by 
expressing the above -described nucleic acid sequences in 

20 suitable host cells. Recombinant DNA expression systems 
that are suitable to produce DS-CAM proteins described 
herein are well-known in the art. For example, the 
above-described nucleotide sequences can be incorporated 
into vectors for further manipulation. As used herein, 

25 vector (or plasmid) refers to discrete elements that are 
used to introduce heterologous DNA into cells for either 
expression or replication thereof. 

Suitable expression vectors are well-known in 
the art, and include vectors capable of expressing DNA 

3 0 operatively linked to a regulatory sequence, such as a 

promoter region that is capable of regulating expression 
of such DNA. Thus, an expression vector refers to a 
recombinant DNA or RNA construct, such as a plasmid, a 
phage, recombinant virus or other vector that, upon 

3 5 introduction into an appropriate host cell, results in 
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expression of the inserted DNA. Appropriate expression 
vectors are well known to those of skill in the art and 
include those that are replicable in eukaryotic cells 
and/or prokaryotic cells and those that remain episomal 
5 or those which integrate into the host cell genome. 

As used herein, a promoter region refers to a 
segment of DNA that controls transcription of DNA to 
which it is operatively linked. The promoter region 
includes specific sequences that are sufficient for RNA 
polymerase recognition, binding and transcription 
initiation. In addition, the promoter region includes 
sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. 
These sequences may be cis acting or may be responsive to 
trans acting factors. Promoters, depending upon the 
nature of the regulation, may be constitutive or 
regulated. Exemplary promoters contemplated for use in 
the practice of the present invention include the SV4 0 
early promoter, the cytomegalovirus (CMV) promoter, the 
mouse mammary tumor virus (MMTV) steroid- inducible 
promoter, Moloney murine leukemia virus (MMLV) promoter, 
and the like. 

As used herein, the term "operatively linked" 
refers to the functional relationship of DNA with 
25 regulatory and effector nucleotide sequences, such as 
promoters , enhancers , transcriptional and translational 
stop sites, and other signal sequences. For example, 
operative linkage of DNA to a promoter refers to the 
physical and functional relationship between the DNA and 
3 0 the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that 
specifically recognizes, binds to and transcribes the 
DNA. 
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As used herein, expression refers to the 
process well-known to those of skill in the art by which 
polynucleic acids are transcribed into mRNA and 
translated into peptides or proteins and, optionally 
5 thereafter, modified post-translationally . If the 
invention nucleic acid is derived from genomic DNA, 
expression may, if an appropriate eukaryotic host cell or 
organism is selected, include splicing of the mRNA. 

Prokaryotic transformation vectors are 
well-known in the art and include pBluescript and phage 
Lambda ZAP vectors (STRATAGENE, San Diego, CA) , and the 
like. Other suitable vectors and promoters are disclosed 
in detail in U.S. Patent No. 4,798,885, issued January 
17, 1989, the disclosure of which is incorporated herein 
by reference in its entirety. 

Other suitable vectors for transformation of 
E. coli cells include the pET expression vectors 
(Novagen, see U.S patent 4,952,496), e.g., pETlla, which 
contains the T7 promoter, T7 terminator, the inducible 
E. coli lac operator, and the lac repressor gene; and pET 
12a-c, which contain the T7 promoter, T7 terminator, and 
the E. coli ompT secretion signal. Another suitable 
vector is the pIN-IIIompA2 (see Duffaud et al . , Meth. in 
Enzymoloay , 153:492-507, 1987), which contains the Ipp 
promoter, the lacUV5 promoter operator, the ompA 
secretion signal, and the lac repressor gene. 

Exemplary, eukaryotic transformation vectors, 
include the cloned bovine papilloma virus genome, the 
3 0 cloned genomes of the murine retroviruses, and eukaryotic 
cassettes, such as the pSV-2 gpt system (described by 
Mulligan and Berg, Nature 277:108-114, 1979) the 
Okayama-Berg cloning system ( Mol . Cell Biol . 2i 161-170 , 
1982) , and the expression cloning vector described by 



Genetics Institute {Science 228 : 810-815, 1985), are 
available which provide substantial assurance of at least 
some expression of the protein of interest in the 
transformed eukaryotic cell line. 

Particularly preferred base vectors which 
contain regulatory elements that can be linked to the 
invention DS- CAM- encoding DMAs for transfection of 
mammalian cells are cytomegalovirus (CMV) promoter-based 
vectors such as pcDNAl (Invitrogen, San Diego, CA) , MMTV 
promoter-based vectors such as pMAMNeo (Clontech, Palo 
Alto, CA) and pMSG (Pharmacia, Piscataway, NJ) , and SV40 
promoter-based vectors such as pSVp (Clontech, Palo Alto, 
CA) . 

In accordance with another embodiment of the 
present invention, there are provided "recombinant cells" 
containing the nucleic acid molecules (i.e., DNA or mRNA) 
of the present invention. Methods of transforming 
suitable host cells, preferably bacterial cells, and more 
preferably E. coli cells, as well as methods applicable 
for culturing said cells containing a gene encoding a 
heterologous protein, are generally known in the art. 
See, for example, Sambrook et al . , supra , 198 9. 

Exemplary methods of transformation include, 
e.g., transformation employing plasmids, viral, or 
bacterial phage vectors, transfection, electroporation, 
lipofection, and the like. The heterologous DNA can 
optionally include sequences which allow for its 
extrachromosomal maintenance, or said heterologous DNA 
can be caused to integrate into the genome of the host 
(as an alternative means to ensure stable maintenance in 
the host) , 

Host organisms contemplated for use in the 
practice of the present invention include those organisms 
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in which recombinant production of heterologous proteins 
has been carried out. Exemplary cells for introducing 
DNA include cells of mammalian origin (e.g., COS cells, 
mouse L cells, Chinese hamster ovary (CHO) cells, human 
5 embryonic kidney (HEK) cells, African green monkey cells 
and other such cells known to those of skill in the art) , 
amphibian cells (e.g., Xenopus laev^is oocytes), yeast 
cells (e.g.. Sac char omyces cerevisiae, Candida 
tropicalis , Hansenula polymorpha and P. pastoris} see, 
10 e.g., U.S. Patent Nos. 4,882,279, 4,837,148, 4,929,555 
and 4,855,231), bacteria (e.g., E. coli) , and the like. 

In one embodiment, nucleic acids encoding the 
invention DS-CAM proteins can be delivered into mammalian 
cells, either in vivo or in vitro using suitable viral 

15 vectors well-known in the art. Suitable retroviral 

vectors, designed specifically for in vivo "gene therapy" 
methods, are described, for example, in WIPO publications 
WO 9205266 and WO 9214829, which provide a description of 
methods for efficiently introducing nucleic acids into 

20 human cells in vivo . In addition, where it is desirable 
to limit or reduce the in vivo expression of the 
invention DS-CAM, the introduction of the antisense 
strand of the invention nucleic acid is contemplated. 

In accordance with yet another embodiment of 
25 the present invention, there are provided anti-DS-CAM 
antibodies having specific reactivity with DS-CAM 
polypeptides of the present invention. Active fragments 
of antibodies are encompassed within the definition of 
"antibody" . Invention antibodies can be produced by 
3 0 methods known in the art using invention polypeptides, 
proteins or portions thereof as antigens. For example, 
polyclonal and monoclonal antibodies can be produced by 
methods well known in the art, as described, for example, 
in Harlow and Lane, Antibodies: A Laboratory Manual (Cold 



Spring Harbor Laboratory, 1988), which is incorporated 
herein by reference. Invention polypeptides can be used 
as immunogens in generating such antibodies. 
Alternatively, synthetic peptides can be prepared (using 
5 commercially available synthesizers) and used as 

immunogens . Amino acid sequences can be analyzed by 
methods well known in the art to determine whether they 
encode hydrophobic or hydrophilic domains of the 
corresponding polypeptide. Altered antibodies such as 

10 chimeric, humanized, CDR-grafted or bifunctional 

antibodies can also be produced by methods well known in 
the art. Such antibodies can also be produced by 
hybridoma, chemical synthesis or recombinant methods 
described, for example, in Sambrook et al . , supra , 1989; 

15 and Harlow and Lane, supra , 1988. Both anti-peptide and 
anti-fusion protein antibodies can be used, (see, for 
example, Bahouth et al . , Trends Pharmaco l. Sci . 12:33 8 
1991; Ausubel et al . , Current Protocols in Molecular 
Biology (John Wiley and Sons, NY 198 9) which are 

2 0 incorporated herein by reference) . 

Antibody so produced can be used, inter alia , 
in diagnostic methods and systems to detect the level of 
DS-CAM protein present in a mammalian, preferably human, 
body sample, such as tissue or vascular fluid. Such 
25 antibodies can also be used for the immunoaf f inity or 
affinity chromatography purification of the invention 
DS-CAM protein. In addition, methods are contemplated 
herein for detecting the presence of DS-CAM polypeptides 
on the surface of a cell comprising contacting the cell 

3 0 with an antibody that specifically binds to DS-CAM 

polypeptides, under conditions permitting binding of the 
antibody to the polypeptides, detecting the presence of 
the antibody bound to the cell, and thereby detecting the 
presence of invention polypeptides on the surface of the 
35 cell. With respect to the detection of such 
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polypeptides, the antibodies can be used for in vitro 
diagnostic or in vivo imaging methods. 



Immunological procedures useful for in vitro 
detection of target DS-CAM polypeptides in a sample 
5 include immunoassays that employ a detectable antibody. 
Such immunoassays include, for example, ELISA, Pandex 
microf luorimetric assay, agglutination assays, flow 
cytometry, serum diagnostic assays and 

immunohistochemical staining procedures which are well 
10 known in the art. An antibody can be made detectable by 
various means well known in the art. For example, a 
detectable marker can be directly or indirectly attached 
to the antibody. Useful markers include, for example, 
radionucleotides, enzymes, fluorogens, chromogens and 
15 chemiluminescent labels. 

Invention anti -DS-CAM antibodies are 
contemplated for use herein to modulate the activity of 
the DS-CAM polypeptide in living animals, in humans, or 
in biological tissues or fluids isolated therefrom. 

2 0 Accordingly, compositions comprising a carrier and an 

amount of an antibody having specificity for DS-CAM 
polypeptides effective to block naturally occurring 
ligands or other DS-CAM-binding proteins from binding to 
invention DS-CAM polypeptides are contemplated herein. 
25 For example, a monoclonal antibody directed to an epitope 
of DS-CAM polypeptide molecules present on the surface of 
a cell and having an amino acid sequence substantially 
the same as an amino acid sequence for a cell surface 
epitope of a DS-CAM polypeptide including the amino acid 

3 0 sequence shown in SEQ ID NO : 2 or SEQ ID NO: 11, or the 

DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO : 8 or 
SEQ ID NO: 9, can be useful for this purpose. 

The present invention further provides 
transgenic non-human mammals that are capable of 
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expressing exogenous nucleic acids encoding DS-CAM 
polypeptides. As employed herein, the phrase "exogenous 
nucleic acid" refers to nucleic acid sequence which is 
not native to the host, or which is present in the host 
5 in other than its native environment (e.g., as part of a 
genetically engineered DNA construct) . 

Also provided are transgenic non-human mammals 
capable of expressing nucleic acids encoding DS-CAM 
polypeptides so mutated as to be incapable of normal 
activity, i.e., do not express native DS-CAM. The 
present invention also provides transgenic non-human 
mammals having a genome comprising antisense nucleic 
acids complementary to nucleic acids encoding DS-CAM 
polypeptides, placed so as to be transcribed into 
antisense mRNA complementary to mRNA encoding DS-CAM 
polypeptides, which hybridizes to the mRNA and, thereby, 
reduces the translation thereof. The nucleic acid may 
additionally comprise an inducible promoter and/or tissue 
specific regulatory elements, so that expression can be 
induced, or restricted to specific cell types. Examples 
of nucleic acids are DNA or cDNA having a coding sequence 
substantially the same as the coding sequence shown in 
SEQ ID N0:1. An example of a non-human transgenic mammal 
is a transgenic mouse. Examples of tissue specificity- 
determining elements are the metallothionein promoter and 
the L7 promoter. 

Animal model systems which elucidate the 
physiological and behavioral roles of DS-CAM polypeptides 
are also provided, and are produced by creating 
3 0 transgenic animals in which the expression of the DS-CAM 
polypeptide is altered using a variety of techniques. 
Examples of such techniques include the insertion of 
normal or mutant versions of nucleic acids encoding a 
DS-CAM polypeptide by microinjection, retroviral 
3 5 infection or other means well known to those skilled in 



the art, into appropriate fertilized embryos to produce a 
transgenic animal. See, for example, Hogan et al . , 
Manipulating the Mouse Embryo: A Labor atory Manual (Cold 
Spring Harbor Laboratory, 1986) . 

Also contemplated herein, is the use of 
homologous recombination of mutant or normal versions of 
DS-CAM genes with the native gene locus in transgenic 
animals, to alter the regulation of expression or the 
structure of DS-CAM polypeptides (see, Capecchi et al . , 
Science 244:1288, 1989; Zimmer et al . , Nature 338:150, 
1989; which are incorporated herein by reference) . 
Homologous recombination techniques are well known in the 
art. Homologous recombination replaces the native 
(endogenous) gene with a recombinant or mutated gene to 
produce an animal that cannot express native (endogenous) 
protein but can express, for example, a mutated protein 
which results in altered expression of DS-CAM 
polypeptides . 

In contrast to homologous recombination, 
microinjection adds genes to the host genome, without 
removing host genes. Microinjection can produce a 
transgenic animal that is capable of expressing both 
endogenous and exogenous DS-CAM protein. Inducible 
promoters can be linked to the coding region of nucleic 
acids to provide a means to regulate expression of the 
transgene. Tissue specific regulatory elements can be 
linked to the coding region to permit tissue- specific 
expression of the transgene. Transgenic animal model 
systems are useful for in vivo screening of compounds for 
identification of specific ligands, i.e., agonists and 
antagonists, which activate or inhibit protein responses. 

Invention nucleic acids, oligonucleotides 
(including antisense) , vectors containing same, 
transformed host cells, polypeptides and combinations 
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thereof, as well as antibodies of the present invention, 
can be used to screen compounds in vitro to determine 
whether a compound functions as a potential agonist or 
antagonist to invention polypeptides. These in vitro 
5 screening assays provide information regarding the 

function and activity of invention polypeptides, which 
can lead to the identification and design of compounds 
that are capable of specific interaction with one or more 
types of polypeptides, peptides or proteins. 

In accordance with still another embodiment of 
the present invention, there is provided a method for 
identifying compounds which bind to DS-CAM polypeptides. 
The invention proteins may be employed in a competitive 
binding assay. Such an assay can accommodate the rapid 
screening of a large number of compounds to determine 
which compounds, if any, are capable of binding to DS-CAM 
proteins. Subsequently, more detailed assays can be 
carried out with those compounds found to bind, to 
further determine whether such compounds act as 
modulators, agonists or antagonists of invention 
proteins . 

Another application of the binding assay of the 
invention is the assay of test samples (e.g., biological 
fluids) for the presence or absence of DS-CAM. Thus, for 
25 example, serum from a patient displaying symptoms thought 
to be related to over- or under-production of DS-CAM can 
be assayed to determine if the observed symptoms are 
indeed caused by over- or under-production of DS-CAM. 

In another embodiment of the invention, there 
3 0 is provided a bioassay for identifying compounds which 
modulate the activity of invention DS-CAM polypeptides. 
According to this method, invention polypeptides are 
contacted with an "unknown" or test substance (in the 
presence of a reporter gene construct when antagonist 
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activity is tested) , the activity of the polypeptide is 
monitored subsequent to the contact with the "unknown" or 
test substance, and those substances which cause the 
reporter gene construct to be expressed are identified as 
5 functional ligands for DS-CAM polypeptides. 

In accordance with another embodiment of the 
present invention, transformed host cells that 
recombinantly express invention polypeptides can be 
contacted with a test compound, and the modulating 

10 effect (s) thereof can then be evaluated by comparing the 
DS-CAM-mediated response {e.g., via reporter gene 
expression) in the presence and absence of test compound, 
or by comparing the response of test cells or control 
cells (i.e., cells that do not express DS-CAM 

15 polypeptides) , to the presence of the compound. 

As used herein, a compound or a signal that 
"modulates the activity" of invention polypeptides refers 
to a compound or a signal that alters the activity of 
DS-CAM polypeptides so that the activity of the invention 

2 0 polypeptide is different in the presence of the compound 
or signal than in the absence of the compound or signal. 
In particular, such compounds or signals include agonists 
and antagonists. An agonist encompasses a compound or a 
signal that activates DS-CAM protein expression, 

25 Alternatively, an antagonist includes a compound or 

signal that interferes with DS-CAM protein expression. 
Typically, the effect of an antagonist is observed as a 
blocking of agonist- induced protein activation. 
Antagonists include competitive and non- competitive 

30 antagonists. A competitive antagonist (or competitive 
blocker) interacts with or near the site specific for 
agonist binding. A non-competitive antagonist or blocker 
inactivates the function of the polypeptide by 
interacting with a site other than the agonist 

35 interaction site. 
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As understood by those of skill in the art, 
assay methods for identifying compounds that modulate 
DS-CAM activity generally require comparison to a 
control. One type of a "control" is a cell or culture 
5 that is treated substantially the same as the test cell 
or test culture exposed to the compound, with the 
distinction that the "control" cell or culture is not 
exposed to the compound. For example, in methods that 
use voltage clamp electrophysiological procedures, the 
10 same cell can be tested in the presence or absence of 
compound, by merely changing the external solution 
bathing the cell. Another type of "control" cell or 
culture may be a cell or culture that is identical to the 
transfected cells, with the exception that the "control" 
15 cell or culture do not express native proteins. 

Accordingly, the response of the transfected cell to 
compound is compared to the response (or lack thereof) of 
the "control" cell or culture to the same compound under 
the same reaction conditions. 

Since it is well-known that CAMs interact with 
extracellular ligands, it is contemplated that invention 
DS-CAM proteins interact with extracellular ligands. In 
another embodiment of the present invention, it is 
contemplated that invention DS-CAM proteins act 
specifically in concert or in competition with other 
CAMS. Thus, the present invention contemplates various 
bioassays for identifying ligands for invention DS-CAM 
proteins. In addition, the present invention 
contemplates an assay measuring the effect of 
co-expressing during development either normal or 
defective invention DS-CAMs with other CAMs known in the 
art to assess the resulting phenotype . 

In one embodiment of the present invention, 
35 there is provided a bioassay for evaluating whether test 
compounds are capable of acting as agonists comprises: 



25 
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(a) culturing cells containing: 

DNA which expresses DS-CAM 
protein (s) or functional modified 
forms thereof , and 
5 DNA encoding a reporter protein, 

wherein said DNA is operatively 
linked to a DS-CAM responsive 
transcription element; 
wherein said culturing is carried out in 
10 the presence of at least one compound 

whose ability to induce signal 
transduction activity of DS-CAM protein is 
sought to be determined, and thereafter 

(b) monitoring said cells for expression of 
15 said reporter protein. 

In another embodiment of the present invention, 
the bioassay for evaluating whether test compounds are 
capable of acting as antagonists for DS-CAM protein (s) of 
the invention, or functional modified forms of said 
20 DS-CAM protein(s), comprises: 

(a) culturing cells containing: 

DNA which expresses DS-CAM 
protein (s), or functional modified 
forms thereof , and 

2 5 DNA encoding a reporter protein, 

wherein said DNA is operatively 
linked to a DS-CAM responsive 
transcription element 
wherein said culturing is carried out in 

3 0 the presence of: 

increasing concentrations of at 
least one compound whose ability to 
inhibit signal transduction activity 
of DS-CAM protein (s) is sought to be 
3 5 determined, and 
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a fixed concentration of at 
least one agonist for DS-CAM 
protein (s), or functional modified 
forms thereof; and thereafter 
5 (b) monitoring in said cells the level of 

expression of said reporter protein as a 
function of the concentration of said 
compound, thereby indicating the ability 
of said compound to inhibit signal 
10 transduction activity. 



In step (a) of the above-described antagonist 
bioassay, culturing may also be carried out in the 
presence of: 

fixed concentrations of at least 
15 one compound whose ability to inhibit 

signal transduction activity of 
DS-CAM protein (s) is sought to be determined, and 

an increasing concentration of 
at least one agonist for DS-CAM 
20 protein (s), or functional modified 

forms thereof . 



In yet another embodiment of the present 
invention, it is contemplated that invention DS-CAM 
proteins mediate signal transduction through the 
25 modulation of adenylate cyclase. For example, when a 

DS-CAM ligand binds to DS-CAM, adenylate cyclase causes 
an elevation in the level of intracellular cAMP. 
Accordingly, in one embodiment of the present invention, 
the bioassay for evaluating whether test compounds are 
30 capable of acting as agonists or antagonists comprises: 
(a) culturing cells containing : 

DNA which expresses DS-CAM 
protein (s) or functional modified 
forms thereof , 



wherein said culturing is carried out in 
the presence of at least one compound 
whose ability to modulate signal 
transduction activity of DS-CAM protein is 
sought to be determined, and thereafter 
(b) monitoring said cells for either an 
increase or decrease in the level of 
intracellular cAMP. 

Methods well-known in the art that measure 
intracellular levels of cAMP, or measure cyclase 
activity, can be employed in binding assays described 
herein to identify agonists and antagonists of the 
DS-CAM. For example, because activation of some CAMs 
results in decreases or increases in cAMP, assays that 
measure intracellular cAMP levels can be used to evaluate 
recombinant DS-CAMs expressed in mammalian host cells. 

As used herein, "ability to modulate signal 
transduction activity of DS-CAM protein" refers to a 
compound that has the ability to either induce (agonist) 
or inhibit (antagonist) signal transduction activity of 
the DS-CAM protein. 

Each of the invention bioassays (e.g., those 
described herein, and the like) , can be conducted as 
competitive assays by co-expressing one or more members 
of the CAM immunoglobulin superfamily of proteins known 
in the art, such as N-CAMs, along with invention DS-CAMs. 
In addition, one or more members of the CAM 
immunoglobulin superfamily of proteins known in the art 
can be co-expressed with invention DS-CAMs to evaluate 
the agonistic or antagonistic effect on signal 
transduction of the non-DS-CAM members acting in concert 
with invention DS-CAMS. 
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In yet another embodiment of the present 
invention, the activation of DS-CAM polypeptides can be 
modulated by contacting the polypeptides with an 
effective amount of at least one compound identified by 
5 the above-described bioassays . 

Members of the N-CAM superfamily of 
immunoglobulins have previously been implicated in 
disease. For example, various alterations of N-CAM 
levels have been seen in degenerative disease, 

10 developmental defects, and toxic conditions. Increases 
in the levels of N-CAM in the cerebrospinal fluid of 
patients with multiple sclerosis have been observed to 
parallel their clinical improvement (Massaro et al , , 
Ttal. J. Ne.iirol. .^ci. SutdtdI . 6:85-88, 1987). Levels of 

15 N-CAM were reported to be elevated in the amniotic fluid 
of mothers carrying fetuses with neural tube defects 
(Ibsen et al . , JJeurochem, 41:363-366, 1983). Since 
many such defects are likely to be due to mechanical 
aberrations rather than genetic defects, confirmation of 

2 0 these results would provide a new diagnostic component 

for prenatal testing. Another provocative finding 
relates to observations on the stimulation of Golgi 
sialyltransf erases by lead (Breen and Regan, Development 
104:147-154, 1988; and Cookman et al . , lT. Neurochem. 
25 49:399-403, 1987). Exposure to lead chloride markedly 

stimulated sialyltransf erase activity from postnatal days 
16 to 30 in rate. This time is coincident with the 
period when N-CAM normally becomes less sialylated. Thus 
exposure to lead at critical developmental periods would 

3 0 presumably lead to more highly sialylated, less adhesive, 

forms of N-CAM: this prevention of E-A conversion could 
have significant effects on neural development. E-A 
conversion itself has been found to be delayed in the 
mouse mutant staggerer (Edelman and Chuong, Proc . Natl. 
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Acad. Sci , USA, 79:7036-7042, 1982) in conjunction with 
the connectivity changes associated with the mutation. 



The location and expression of DS-CAM in the 
Down Syndrome (DS) phenotype is supported by the studies 
5 of patients with partial trisomy 21. A subset of the DS 
features, including the typical facial appearance and 
mental retardation, were suggested by duplication of band 
21q22 only (Niebuhr, Humangenetik 21:99-101, 1974). 
Other studies mapped those features and congenital heart 

10 disease to the region 21q22 . 2-q22 . 3 and between D21S267 
and MX1/MX2 (Korenberg et al . , Am. J. Hum. Genet . 
50:294-302, 1992 and Korenberg et al . , Proc . Natl . Acad . 
Sci. USA 91:4997-5001, 1994), a region of about 4 Mb that 
contains DS-CAM. The Ts65Dn mouse model of DS contains 

15 the region of MMU16 (Pgkl-psl to MXl/2) that includes 

DS-CAM and reveals some of the neurobehaviourial features 
of DS (Reeves et al . , Nature Genet . 11:177-183, 1995 and 

Holtzman et al . , Proc . Natl . Acad . Sci . USA 93:13333- 

13338, 1996) . 

2 0 Close to 6% of DS individuals have 

Hirschsprung's disease (HSCR) (Garver et al . , Clin. 

Genet . 28:503-5-8, 1985) and more than 10% of all HSCR is 

associated with DS (Passarge, New Eng. J. Med . 276:138- 

143, 1967) . A modifier region of HSCR on chromosome 
25 21q22 (D21S259 - D21S156) has been reported in non-DS 

HSCR (Puf f enberger et al . , Hum. Mol . Genet . 3:1217-1225, 

1994) . The DS-CAM gene maps within this small region. 
The expression of DS-CAM in the neural crest derived 
enteric plexus of the gut was detected by mouse tissue in 

3 0 situ hybridization (Example 7) . The function of the DS- 

CAM protein as a neural cell adhesion molecule and the 
association of this region of chromosome 21 with HSCR, 
indicate that DS-CAM can play a role in the migration of 



the cranial neural crest that populate this region. 
Thus, DS-CAM overexpression is responsible for the 
chromosome 21 association in non-DS HSCR and for the HSCR 
seen in DS . 

Mutations in the molecule CAM-Ll, a molecule 
more similar to DS-CAM than to N-CAM (Figure 4) , have 
established roles in human disease. The result in X- 
linked hydrocephalus (Rosenthal et al . , Nature Genet. 
2:107-112, 1992), type 1 X-linked spastic paraplegia and 
the MASA syndrome (including mental retardation, aphasia, 
shuffling gait, adducted thumb and agenesis of the corpus 
callosum) (Jouet et al , , Nature Genet . 7:402-407, 1994). 
The perturbation of development by the aneuploid 
expression of CAM-Ll supports a role for the aneuploid 
expression of DS-CAM in the causation of developmental 
and neurological abnormalities. 

In accordance with another embodiment of the 
present invention, there are provided methods for 
diagnosing DS-CAM associated disease, such as mental 
retardation, holoprosencephaly , agenesis of the corpus 
callosum, or schizencephaly , said method comprising: 

detecting, in said subject, a genomic or 
transcribed mRNA sequence including SEQ ID NO : 1 
or SEQ ID NO: 10, or fragments thereof. 

Preferably, the DS-CAM nucleic acids detected in 
accordance with the invention diagnostic methods are 
either mutated in one form or another (such as point 
mutations, deletions, and the like) , or are overexpressed 
relative to levels of DS-CAM expression in healthy 
non-diseased individuals. 

In accordance with another embodiment of the 
present invention, there are provided diagnostic systems. 
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preferably in kit form, comprising at least one invention 
nucleic acid in a suitable packaging material. The 
diagnostic nucleic acids are derived from the 
DS- CAM- encoding nucleic acids described herein. In one 
5 embodiment, for example, the diagnostic nucleic acids are 
derived from SEQ ID N0:1, SEQ ID NO: 7, SEQ ID NO : 8 , 
SEQ ID NO: 9 or SEQ ID NO: 10. Invention diagnostic 
systems are useful for assaying for the presence or 
absence of nucleic acid encoding DS-CAM in either genomic 
10 DNA or in transcribed nucleic acid (such as mRNA or cDNA) 
encoding DS-CAM. 

A suitable diagnostic system includes at least 
one invention nucleic acid, preferably two or more 
invention nucleic acids, as a separately packaged 
15 chemical reagent (s) in an amount sufficient for at least 
one assay. Instructions for use of the packaged reagent 
are also typically included . Those of skill in the art 
can readily incorporate invention nucleic probes and/or 
primers into kit form in combination with appropriate 

2 0 buffers and solutions for the practice of the invention 

methods as described herein. 

As employed herein, the phrase "packaging 
material" refers to one or more physical structures used 
to house the contents of the kit, such as invention 
25 nucleic acid probes or primers, and the like. The 

packaging material is constructed by well known methods, 
preferably to provide a sterile, contaminant -free 
environment . The packaging material has a label which 
indicates that the invention nucleic acids can be used 

3 0 for detecting a particular sequence encoding DS-CAM 

including the nucleotide sequence set forth in 
SEQ ID N0:1, SEQ ID NO : 7 , SEQ ID NO : 8 , SEQ ID NO : 9 or 
SEQ ID NO: 10, thereby diagnosing the presence of, or a 
predisposition for, holoprosencephaly , agenesis of the 
3 5 corpus callosum, or for several phenotypes of Down 
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Syndrome including mental retardation, and the like. In 
addition, the packaging material contains instructions 
indicating how the materials within the kit are employed 
both to detect a particular sequence and diagnose the 
presence of, or a predisposition for, holoprosencephaly , 
agenesis of the corpus callosum, or for several 
phenotypes of Down syndrome including mental retardation, 
and the like. 

The packaging materials employed herein in 
relation to diagnostic systems are those customarily 
utilized in nucleic acid-based diagnostic systems. As 
used herein, the term "package" refers to a solid matrix 
or material such as glass, plastic, paper, foil, and the 
like, capable of holding within fixed limits an isolated 
nucleic acid, oligonucleotide, or primer of the present 
invention. Thus, for example, a package can be a glass 
vial used to contain milligram quantities of a 
contemplated nucleic acid, oligonucleotide or primer, or 
it can be a microtiter plate well to which microgram 
quantities of a contemplated nucleic acid probe have been 
operatively affixed. 

"Instructions for use" typically include a 
tangible expression describing the reagent concentration 
or at least one assay method parameter, such as the 
relative amounts of reagent and sample to be admixed, 
maintenance time periods for reagent/sample admixtures, 
temperature, buffer conditions, and the like. 

All U.S. patents and all publications mentioned 
herein are incorporated in their entirety by reference 
thereto. The invention will now be described in greater 
detail by reference to the following non-limiting 
examples . 
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Materials and Methods 



Unless otherwise stated, the present invention 
was performed using standard procedures, as described, 
for example in Maniatis et al . , Molecular Cloning: 
5 A Laboratory Manual , Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, New York, USA, 1982; Sambrook et al . , 
supra , 1989; Davis et al . , Basic Me thods in Molecular 
Biology , Elsevier Science Publishing, Inc., New York, 
USA, 198 6; or Methods in Enzymoloay: Guide to Molecular 
10 Cloning Techniques Vol. 152, S. L. Berger and A. R, 

Kimmerl Eds., Academic Press Inc., San Diego, USA, 1987. 

Libraries . 

Construction of Bacterial Artificial Chromosome 
15 (BAC) library. BAC library construction of total human 
genomic DNA was performed as described in Shizuya et al . , 
Proc. Natl. Acad. Sci . USA 89:8794-8797, 1992; and Hubert 
et al., (Genomics 41:218-226, 1997. Yeast artificial 
chromosome (YAC) clones were obtained from the CEPH 

2 0 mega- YAC library and grown under standard conditions 

(Cohen et al . , Nature 366:689-701 1993). 

PI artificial chromosome (PAC) library 
construction, A 3X human PAC library, designated RPCI-1 
(loannou et al . , Hum. Genet . 219-220, 1994) was 

25 constructed as described (loannou et al . , Nat , Genet. 

6:84-89, 1994). The library was arrayed in 384 well 
dishes. Subsequently, STSs generated by sequencing of 
clones using vector primers were used as hybridization 
probes to gridded colony filters of the PAC library. 

3 0 YAC DNA preparation. YAC clones were grown in 

selective media, pelleted and resuspended in 3 ml 0 . 9 M 
sorbitol, O.IM EDTA pH 7.5, then incubated with 10 0 U of 
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lytocase (Sigma, St. Louis, MO) at 37°C for 1 hour. After 
centrifugation for 5 minutes at 5,000 rpm pellets were 
resuspended in 3 ml 50 mM Tris pH 7.45, 20 mM EDTA 0.3ml 
10% SDS was added and the mixture was incubated at 65°C 
5 for 3 0 minutes. One ml of 5 M potassium acetate was 
added and tubes were left on ice for 1 hour, then 
centrifuged at 10,000 rpm for 10 minutes. Supernatant 
was precipitated in 2 volumes of ethanol and pelleted at 
6,000 rpm for 15 minutes. Pellets were resuspended in 
10 TE, treated with RNase and reextracted with 
phenol - chloroform . 

Anailysis by fluorescence in situ hybridization 
(FISH) . PAC or BAG clones were biotinylated by 
15 nicktranslation in the presence of biotin-14 -dATP using 

the BioNick Labeling Kit (Gibco-BRL) . FISH was performed 
essentially as described (Korenberg et al . , Cytogenet . 
Cell Genet . 69:196-200, 1995). Briefly, 400 ng of probe 
DNA was mixed with 8 ng of human Cot 1 DNA (Gibco-BRL) 

2 0 and 2 )j,g of sonicated salmon sperm DNA in order to 

suppress possible background produced from repetitive 
human sequences as well as yeast sequences in the probe. 
The probes were denatured at 75°C, preannealed at 3 7°C for 
one hour, and applied to denatured chromosome slides 
25 prepared from normal male lymphocytes (Korenberg et al . , 
supra , 1995) . Post -hybridization washes were performed 
at 4 0°C in 2X SSC/50% formamide followed by washes in IX 
SSC at 50°C. Hybridized DNAs were detected with 
avidin- conjugated fluorescent isothiocyanate (Vector 

3 0 Laboratories) . One amplification was performed by using 

biotinylated ant i -avidin. For distinguishing chromosome 
subbands precisely, a reverse banding technique was used, 
which was achieved by chromomycin A3 and distamycin A 
double staining (Korenberg et al . , supra , 1995). The 
3 5 color images were captured by using a Photometries 



49 

Cooled-CCD camera and BDS image analysis software {Oncor 
Imaging, Inc . ) . 



Southern blot analysis. Gel electrophoresis of 
DNA was carried out on 0.8% agarose gels in IX TBE . 
5 Transfer of nucleic acids to Nybond N+ nylon membrane 

(Amersham) was performed according to the manufacturer's 
instruction. Probes were labeled using RadPrime Labeling 
System (BRL) . Hybridization was carried out at 42 °C for 
16 hours in 50% formamide, 5X SSPE, 5X Denhardt » s 0.1% 
10 BBS, 10 0 mg/ml denatured salmon sperm DNA, The filters 
were washed once in Ix SSC, 0.1% SDS at room temperature 
for 20 minutes, and twice in 0 . IX SSC, 0.1% SDS for 20 
minutes at 65°C. The blots were exposed onto X-ray film 
(Kodak, X-OMAT-AR) . 

15 Sequencing of PAC and BAC endclones . PAC 

clones were inoculated into 500 ml of LB/kanamycin and 
grown overnight. BAC clones were inoculated into 500 ml 
of LB/chloramphenicol and grown overnight. DNAs were 
isolated using QIAGEN columns according to the vendors 

2 0 protocol with one additional 

phenol /chloroform/isoamylalcohol extraction followed by 
one additional chloroform/isoamylalcohol extraction . 
Clones were sequenced using the Gibco-BRL cycle 
sequencing kit with standard T7 and SPG primers. 

25 EXAMPLE 1 

Construction of BAC Contig 

To provide stable clones for gene isolation and 
sequencing initiatives in the D21S55 to MXl region, 
contigs were constructed using Bacterial Artificial 

3 0 Chromosomes (BACs) and PI Artificial Chromosomes (PACs) . 

BAC library construction of total human genomic DNA was 
performed as described (Shiyuza et al . , supra , 1992; Kim 



et al.. Genomics 34:213-218, 1996) . A BAG library was 
screened using several YACs spanning the region; a PAC 
library (lannou et al . , Nature Genet , 6:84-89, 1994) was 
screened using radiolabeled STS PGR products and whole 
5 BACs in gap filling initiatives. 

The location of these BAG and PAC clones was 
confirmed by fluorescence in situ hybridization (FISH) . 
Clone to clone Southerns using 24 new STSs (generated 
from direct sequencing of BAG and PAG ends) along with 35 

10 pre-existing STSs were used to show overlaps between BACs 
and PAGs. The STS density over the intervals covered in 
BACs and PAGs was 1 STS every 6 0 kb, and 79% of the 
clones were positive for 2 or more STSs. Approximately 
3.5Mb of the 4-5Mb D21S55 to MXl interval is covered in 

15 85 BACs and 25 PAGs representing 4-fold coverage within 
the contigs (Hubert et al . , Genomics 41:218-226, 1997). 
The minimal contig sizes as determined by counting only 
non-overlapping clones are: 1100 kb, 900 kb, 510 kb, 380 
kb and 270 kb. Insert size of BAG clones was measured by 

20 running pulse-field gel electrophoresis after digesting 
DNA with Not I. 

EXAMPLE 2 

Direct cDNA Selection 

A modified direct cDNA selection technique 
25 (Yamakawa et al . , Hum. Mol . Genet. 4:709-716, 1995; 

Yamakawa et al . , Cytogenet . Cell Genet. 74:140-145, 1996) 
was applied to BAC-423A5, BAG-430F1, BAG-628H2, BAG-371H8 
and PAG-31P10 (Figure 1) by using cDNA from trisomy 21 
human fetal brain, and the selected fragments were then 
3 0 subcloned into a plasmid vector. 



Total RNA was isolated from 14 week trisomy 21 
fetal brain using TRI region™ (Molecular Research Center, 

® 

Inc.). Poly (A)* RNA was isolated using Poly (A) Quick 
raRNA isolation kit (STRATAGENE) . Double stranded cDNA 
was synthesized using Superscript™ Choice System (GIBCO 
BRL) from 5 /xg trisomy 21 fetal brain poly (A) ^ RNA using 
1 ixg oligo (dT) 15 or 0.1 i^g random hexamer. The entire 
synthesis reaction was purified by Gene Clean^II kit 
{BIOlOl, Inc.) and then kinased. Sau3AI linker was 
attached to the cDNA which was subsequently digested with 
Sau3AI. The reaction was purified using Gene Clean. 
Mho I linker was attached to the cDNA and the reaction 
purified by Gene Clean (Morgan et al . , supra , 1992) . The 
synthesized product was amplified by PGR using one strand 
of Mbol linker ( 5 ■ CCTGATGCTCGAGTGAATTC3 ' ) (SEQ ID NO: 4) 
as a primer. PGR cycling conditions were 40 cycles of 
94°C/15 seconds, 60°C/23 seconds, 72°C/2 minutes in a 100 
fjLl of IX PGR buffer (Promega) , 3 mM MgCla, 5.0 units of 
Taq polymerase (Promega), 2 /xM primer and 0 . 2 mM dNTPs. 

Nineteen BAG DNAs (total 2.5 /xg) and 2 PAG DNAs 
between the region ETS2 and MXl were prepared using 
QIAGEN plasmid kit and were biotinylated using Nick 
Translation Kit and biotin-16 -dUTP (Boehringer 
Manneheim) . 3 /xg of heat denatured PGR amplified cDNA 
was annealed with 3 jxg of heat denatured GOTl DNA (BRL) 
in 100/xl hybridization buffer (750 mM NaCl, 50 mM 
NaP04 (pH7 . 2 ) , 5 mM EDTA, 5X Denhardt's, 0.05% SDS and 50% 
formamide) at 42 °C for two hours. After 

prehybridization, 1.2 ^g of heat denatured biotinylated 
BAG DNA was added and incubated at 42°G for 16 hours. 
cDNA-BAC DNA hybrids were precipitated with EtOH and 
dissolved in 60 fil of 10 mM Tris-HCl (pH 8.0), 1 mM EDTA. 
After addition of 4 0 /xl 5 M NaCl, the DNA was incubated 
with magnetic beads (Dynabeads M-280, Dynal) at 2 5°G for 
1 hour with gentle rotating to allow attachment of the 
DNA to the magnetic beads. The beads were then washed 



twice by pipetting in 400 ^xl of 2X BSC, setting in magnet 
holder (MPC-E^, Dynal) for 30 seconds and removing the 
supernatant. Four additional washes were performed in 
0.2X SSC at 68°C for 10 minutes each with transfer of the 
beads to new tubes at each wash. cDNAs were eluted in 
100/xl of distilled water for 10 minutes at 80''C with 
occasional mixing. The eluted cDNAs were amplified by 
PGR as described above. After twice repeating the 
selection procedure using magnetic beads, amplified cDNAs 
were digested with EcoRI and subcloned into pBlueScript 
KS+ (STRATAGENE) . Insert DNAs were isolated from the 
subclones, and were analyzed by Southern hybridization 
and DNA sequencing. 

The direct cDNA selection procedure using 19 
BAGS and 2 PAGs between ETS2 and MXl generated a total of 
145 unique cDNA fragments. Genbank and TIGR homology 
searches using FASTA revealed matches to ETS2, HMG14 , 
PEP19, a Na K ATPase, Titan ESTs, MXl region ESTs, and 14 
ESTs of unknown function. A cDNA library from a trisomy 
21 fetal brain at 14 weeks gestation was screened using 
one of these unique cDNA fragments labeled "E51" 
(SEQ ID NO: 3) . 

EXAMPLE 3 

TRolation of human D.q-CAM c DNA using cDNA Library 

Screening 

A trisomy 21 human fetal brain (14 weeks of 
age) cDNA library was constructed using ZAP-cDNA° 
synthesis kit (STRATAGENE) which generates a 
unidirectional cDNA library. Briefly, double- stranded 
cDNA was synthesized from 5 /xg trisomy 21 fetal brain 
poly (A)" RNA using a hybrid oligo (dT) -Xhol linker primer 
with 5 -methyl dCTP. An EcoRI linker was attached to the 
cDNA which was subsequently digested with EcoRI and Xhol, 



and then cloned into UNI -ZAP XR vector (STRATAGENE) . The 
library was packaged using Gigapack" II Gold packaging 
extract. The titer of the original library was 1.1 x lO' 
p.f.u. /package. The library was amplified once. A 
blue-white color assay indicated that 99% of the clones 
had inserts . 

Screening of the trisomy 21 fetal brain cDNA 
library was performed using one of the 14 5 unique cDNA 
fragments labeled "E51" (SEQ ID NO: 3) prepared as 
described above. Phages were plated to an average 
density of 1 x lO" per 175 cm" plate. Plaque lifts of 20 
plates (2 X lO' phages) were made using duplicated nylon 
membranes (Hybond-N+; Amersham) . Hybridized membranes 
were washed to final stringency of 0 . 2X SSC, 0 . IX SDS at 
65°C. The filters were exposed overnight onto X-ray 
film. 

Identification of 62 clones were made out of 2 
X 10' clones in the original library. Eighteen of these 
positive phage clones were converted to plasmids, and 
their DNAs were isolated. These cDNAs were independently 
numbered as separate DS-CAM (Down Syndrome Cell Adhesion 
Molecule) clones. The length of the inserts of these 
clones ranged from 2.4 kb to 5.6 kb. Exon trapping 
(Buckler et al . , Pr-or^ - Natl. Acad. Sci . USA 88:4005-4009, 
1991; Church et al . , Wai-.nre Genet . 6:98-105, 1994) was 
also used to isolate cDNAs in the BAG and PAC contig. 
With this approach, three exons identified from BAC-539E7 
and one from BAC-430F1 were found to identify the same 
sequences as those isolated by cDNA selection. 

Sequence analysis of one of the clones, labeled 
DS-CAM-42, revealed a 6110 bp DNA sequence which 
contained a large ORF (5687 bp) as well as 3 ' -UTR 
sequence (423 bp), but the 5 ' UTR and start codon were not 
located in clone DS-CAM-42. To characterize the 5' end. 
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two further clones, DS-CAM-18 of 6.5 kb and DS-CAM-52 of 
6.6 kb were characterized. Sequence analyses of these 
clones close to the 5 ' end overlap with sequence at the 
5' end of DS-CAM-42. However, DS-CAM-18 extends 416 bp 
5 farther and DS-CAiyi-52 extends 494 bp farther 5' than 

DS-CAM-42. The extra 494 bp sequence extends the ORF by 
43 bp at the 5' end and contains a start codon. Two stop 
codons occur 33 0 bp and 42 7 bp upstream of the start 
codon. The 494 bp of additional 5' sequence found in 

10 DS-CAM-52 combined with DS-CAM-42 (6604 bp) yield a 

consensus cDNA that encodes one isoform of the invention 
protein labeled DS-CAMl. The DS-CAMl cDNA contains an 
open reading frame of 5 73 0 bp (SEQ ID N0:1) coding for a 
1910 amino acid protein (SEQ ID NO: 2; approxima t e ly 211 

15 kilodaltons) , flanked by 452 bp of 5 ' -UTR and 422 bp of 
3'-UTR. The 5 ' -UTR is highly GC rich (81% GC over 452 
bp) and contains 13 Mspl sites, as well as 72 CG and 93 
GC dinucleotide pairs. 

The DS-CAMl protein contains an extracellular 
20 component at the N-terminus consisting of nine tandemly 
repeated Ig-like C2 type domains and a tenth Ig-like C2 
domain located between domains four and five of an array 
of six repeated fibronectin type III domains (Figure 2) , 
Each Ig-like C2 domain consists of approximately 100 
25 amino acids with a pair of conserved cysteines separated 
by 49-56 residues. A single transmembrane domain of 22 
amino acids was defined by using the TMBASE program 
(Hoffmann and Stoffel, Biol. Chem. Hoppe-Seyler 374:166, 

1993) . The remaining 294 amino acids at the C- terminus 
3 0 corresponding to the cytoplasmic domain have partial 
homologies to the mouse M-phase inducer phosphatase 2 
(Kakizuka et al . , Genes Dev . 6:578-590, 1992) in two 

regions, one with 34% identity and 52% similarity over 46 
bp and a second with 3 8% identity and 52% similarity over 
35 21 bp. The homolog of Drosophila glass gene (O'Neill et 
al., Proc> Natl. Aca d. Sci . USA 92:6557-6561, 1995) with 
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30% identity and 52% similarity over 42 bp, and the mouse 
delta opioid receptor (Evans et al . , Science 258:1952- 

1955, 1992) with 43% identity and 60% similarity over 30 
bp. The putative protein contains 16 potential 
5 N-glycosylation sites. 

A homology search of the predicted amino acid 
sequence of the 573 0 bp open reading frame of DS-CAMl 
(SEQ ID NO:l) to genes registered in the Genbank and the 
EMBL databases was conducted by using the BLAST-P program 
10 (Altschul et al., J. Mol . Biol . 215:403-410, 1990). The 

predicted amino acid sequence revealed homologies to 
multiple proteins (Figure 4) including CAM-Ll (Moos et 
al., Nature 334:701-703, 1988), BIG-1 (brain-derived 

immunoglobulin (Ig) superfamily molecule-1) (Yoshihara et 
15 al.. Neuron 13:415-426, 1994), DCC (deleted in colon 

cancer) (Fearon et al . , Science 247:49-56, 1990), and 

revealed DS-CAM as defining a novel class of the 
immunoglobulin (Ig) superfamily. Homology searches with 
sequences of Ig type-C2 domains and fibronectin type- III 
2 0 domains of the most highly related Ig- superfamily members 
(CAM-Ll, DCC, and axonin-1) were conducted by using the 
FASTA program (Pearson and Lipman, Proc . Natl. Acad. Sci . 
USA 85:2444-2448, 1988), 



In addition, a splice variant cDNA sequence 
25 encoding a non-membrane bound isoform of DS-CAMl, 

referred to herein as DS-CAM2, is provided herein. Two 
human DS-CAM cDNA clones (DS-CAM- 18 and DS-CAM- 52) were 
found to contain identical deletions of 191 bp that occur 
in neighboring exons and that delete bp 5133 to 5323 of 
3 0 the SEQ ID N0:1 cDNA sequence encoding DS-CAMl (Figure 

3) . The resulting splice variant transcript encoding DS- 
CAM2 (SEQ ID NO: 10) is deleted for the entire 
transmembrane domain that is encoded by the more 3 ' of 
these exons. Further, the deletion changes the reading 
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frame and creates a stop codon 3 6 bp downstream of the 
deletion resulting in a soluble extracellular protein of 
1571 amino acids (SEQ ID NO: 11) . The distal border of 
the resulting deletion contains the canonical AG of the 
5 RNA splicing consensus acceptor site. The proximal 
border contains a variant of the donor splice site 
consensus sequence (Jackson, Nucl . Acids Res . 19:3795- 

3798, 1991) . 



To confirm that the DS-CAM cDNA originated from 
10 the BACs and PACs in the Down syndrome region and to 

determine the genomic size of DS-CAM, the longest DS-CAM 
cDNA clones (DS-CAM-42; 6.1 kb, DS-CAM-18; 6.5 kb, 
DS-CAM-52; 6.6 kb) were hybridized to Southern blots 
containing the BAG and PAC clone contig (Figure 1) . 
15 DS-GAM-42, 18 and 52 hybridized to BACs 423A5, 43 0F1, 

628H2, 539E7, 371H8, 825E1, 593D1, 261F12, 30E4, 385B7, 
388F4, and to PACs 31P10, 58D10. BACs 816F6, 116E8, 
720G4, 619H8 were only positive for DS-CAM-18 and 
DS-CAM-52 but negative for DS-CAM-42. All other BACs 
20 shown in Figure 1 were negative. These results indicate 
that the DS-CAM gene spans 900 kb-1200 kb genomic DNA and 
covers a gap in this BAG and PAC contig indicated by an 
arrowhead as well as in the available YAC contigs 
(Korenberg et al . , Genome Res . 5:427-443, 1995; Gardiner 

25 et al., Somat . Cell Mol . Genet . 21:399-414, 1995). 

DS-CAM cDNA sequences were confirmed to originate from 
these BACs and PACs by direct sequencing of the BACs and 
PACs as templates using cDNA sequence-specific primers. 

The map position of DS-CAM on chromosome 
30 21q22.2-22.3 was confirmed by using clone DS-CAM-42 as a 
probe for fluorescence in-situ hybridization. Two 
independent experiments were performed and over 100 
metaphase cells were evaluated. Signals were clearly 
seen on two chromatids of at least one chromosome in 85% 



of cells. There were no other double signal sites se 
in greater than 1% of cells. 



EXAMPLE 4 



KTor-^b^T-n Riot A n^l Y^-is O f Wnman DS-TAM Kxpresgion 

Inserts containing DS-CAM cDNA were excised 
from the base vector by digestion with Xhol and EcoRI . 
After labeling using the random priming method (RadPrime 
Labeling System; GIBCO BRL) , followed by purification 
using G-50 Sephadex columns (Quick Spin Column; 
Boehringer Mannheim) , the fragments were used a probes 
for Northern hybridization using Multiple Tissue Northern 
Blot (Clontech) . A Northern blot assay was conducted 
using DS-CAM cDNA as a probe in various fetal and adult 
tissues including heart, brain, placenta, lung, liver, 
skeletal muscle, kidney, and pancreas. Northern 
hybridization was performed by following the 
manufacturer's instructions. The hybridized membrane was 
washed at a final stringency of 0 . IX SSC and 0 . IX SDS at 
50°C. The filter was exposed to X-ray film (Kodak X-OMAT 
AR) at -70°C for 1-5 days. 

The results of Northern analysis using human 
fetal tissues showed that 8 . 5 kb and 7 . 6 kb transcripts 
are expressed only in fetal brain and not expressed in 
fetal lung, fetal liver and fetal kidney. In adult 
tissues, three transcripts of 9.7 kb, 8.5 kb, and 7 . 6 kb 
are present in the brain. Placenta shows faint bands, 
and the sizes are similar to those in brain. In skeletal 
muscle, a faint smaller band (6.5 kb) is detected. In 
multiple parts of the adult human brain, transcripts of 
9.7 kb, 8.5 kb and 7.6 kb are differentially expressed. 
The 9.7 kb transcript is highly expressed in the 
substantia nigra, moderately expressed in amygdala and 
hippocampus, and less expressed in the whole brain. A 
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similar pattern is obtained using a PGR product which 
spans the 191 bp deletion found in clones DS-CAM-18 and 
DS-CAM-52 encoding the splice variant sequence 
corresponding to DS-CAM2. Thus, splice variant cDNA 
transcripts encoding a DS-CAM family of proteins are 
clearly contemplated by the present invention. 



EXAMPLE 5 



RT-PCR Assays Of Human D5-CAN Expressi on 

Reverse-transcriptase polymerase chain reaction 
(RT-PCR) assays verses cDNA libraries of various human 
tissues were conducted using primers numbered B9-131F 
(SEQ ID NO:5) and B9-131R (SEQ ID N0:6). The results 
demonstrated expression of human DS-CAM mRNA in fetal and 
adult brain, and fetal kidney. In addition, a breast 
carcinoma cell line showed expression of human DS-CAM 
mRNA. 

The cDNAs from 13 independent human fetal and 
adult sources were analyzed by PCR using primer pairs 
that flanked the alternatively spliced region that 
results in a 191 base pair deletion of nucleotides 5133- 
53 23 of the DS-CAMl cDNA set forth in SEQ ID NO : 1 . The 
primers were designed to generate products of different 
sizes for each of the two alternatively spliced 
transcripts: 53 6 bp corresponding to the non-deleted 
DS-CAM-1 transcript and 34 5 bp corresponding to the 
deleted DS-CAM2 transcripts. The analyses included adult 
samples from amygdala (24 years) , skeletal muscle (36 
years) and three independent lymphoblastoid cell lines. 
Fetal samples included whole brain of a trisomy 21 fetus 
(14 weeks), four from whole brain (4.5-13 weeks), one 
from temporal lobe (28 weeks) and two from heart (4.5 and 
13 weeks) . The results indicate that all fetal and adult 
samples produced two bands corresponding to PCR products 



of the predicted sizes which indicates the expression of 
two alternatively spliced transcripts. 



EXAMPLE 6 

Tp^nl on of mouse DS -rAM cDNA clones 

A mouse brain cDNA library was prepared from 19 
week old female C57 Black/6 mice in the Uni-ZAP XR Vector 
(STRATAGENE) . The cDNAs were oligo-dT primed and cloned 
unidirectionally into the EcoRI and Xhol sites of the 
vector. The average insert size is 1.0 kb. The library 
was screened using a human DS-CAM cDNA clone as a probe. 
Two partial mouse DS-CAM cDNA clones were isolated and 
sequenced. The combined nucleotide sequences of these 
clones are set forth in SEQ ID NO: 7, SEQ ID NO : 8 and 
SEQ ID NO: 9, and were found to represent the 5', middle 
and 3' portions, respectively, of cDNA encoding a mouse 
DS-CAM. 

EXAMPLE 7 

TTy-hTiHip^ation an;:^1vsis of DS-C AM c DNA in mou se tissues 

BALB/c and C5 7BL/6 x DBA/2 embryos, fetuses and 
postnatal brains were fixed and embedded as described in 
detail in Lyons et al . , (J. Neurosci. 15:5727-5738, 
1995) . Embryos were fixed in 4% paraformaldehyde in 
phosphate buffered saline (PBS) overnight, dehydrated and 
infiltrated with paraffin. Five to seven micron serial 
sections were mounted on gelatinized slides. Two 
sections were mounted/ s 1 ide , deparaf f inized in xylene, 
rehydrated and post -fixed. The sections were digested 
with proteinase K, post-fixed, treated with 
tri-ethanolamine/acetic anhydride, washed and dehydrated. 
cRNA probes were prepared from DS-CAM-M-14. The plasmid 
was linearized with Xbal and T7 polymerase was used to 



60 

generate the antisense cRNA. The plasmid was linearized 
with Kpnl and T3 polymerase was used to generate the 
sense control cRNA. The cRNA transcripts were 
synthesized according to manufacturer's conditions 
5 (STRATAGENE) and labeled with ^^S-UTP (>1000 Ci/mmol; 

Amersham) . cRNA transcripts larger than 100 nucleotides 
were subjected to alkali hydrolysis to give a mean size 
of 70 bases for efficient hybridization. 

Sections were hybridized overnight at 52°C in 
50% deionized formamide, 0 . 3M NaCl, 20 mM Tris-HCl pH 
7.4, 5 mM EDTA, 10 mM NaP04 , 10% dextran sulfate, Ix 
Denhardt's, 5 0 jug/ml total yeast RNA, and 50-75,00 0 
cpm//il ^^S-labeled cRNA probe. The tissue was subjected to 
stringent washing at 65''C in 50% formamide, 2X SSC, 10 mM 
DTT and washed in PBS before treatment with 2 0 /xg/ml 
RNase A at for 30 minutes. Following washes in 2X 

SSC and 0 . IX SSC for 10 minutes at 3 7°C, the slides were 
dehydrated and dipped in Kodak NTB-2 nuclear track 
emulsion and exposed for 2-3 weeks in light-tight boxes 
with desiccant at 4°C. Photographic development was 
carried out in Kodak D-19. Slides were counterstained 
lightly with toluidine blue and analyzed using both 
light- and darkfield optics of a Zeiss Axiophot 
microscope. Sense control cRNA probes (identical to the 
mRNAs) always gave background levels of hybridization 
signal. Embryonic structures were identified with the 
help of the following atlases: Rugh ( The Mouse: Its 
Reproduction and Development . Oxford Univ. Press, Oxford, 

UK, 1990) , Kaufman ( The Atlas of Mouse Development . 

Acad. Press, New York, NY, 1992) , and Alt man and Bayer 
(supra , 1995) . 

Tissue in situ hybridization analysis was 
performed using a mouse cDNA as a probe on sections of 
normal mouse embryos from days 8.5-17.5 post coitum (pc) 
3 5 as well as in newborn, two weeks and adult brains as 
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described above. The results indicate that there is no 
detectable expression of DS-CAM at 8.5 days pc , At 9 . 5 
days PC expression was detected in the neuroepithelium. 
Low levels of expression were detected within the 
5 branchial arches, suggestive of migrating neural crest 
cells. At 10.5 days pc, the trigeminal ganglia (neural 
crest derived) begin to express the transcript and 
expression within the branchial arches was more evident. 

Expression at 11.5 days pc was abundant 
throughout the brain. The transcript was found within 
the regions of the nervous system that differentiate 
earliest during development (Altman and Bayer, supra , 
1995) . In the brain, this includes the ventral-most 
regions, such as the thalamus and medulla. Some 
expression was detected within the olfactory epithelium. 
Expression within the neural tube begins in two areas: 
the ventrolateral (corresponding to the areas in which 
the motor neurons differentiate) and the lateral gray 
columns (that later form commissural neurons) (Leber et 
al., J Neurosci . 15:1236-1248, 1990). The dorsal root 

ganglia (neural crest derived) expressed the transcript 
at 11.5 days pc. The trigeminal ganglia show higher 
levels at 11.5 days pc than they did at 10.5 days. 
Migrating neural crest can be seen within the maxilla, 
the mandibular arch, and in the developing gut. Signal 
was observed within the mesenchyme surrounding the 
umbilical vein and artery. 

At 12.5 days pc, expression was more extensive 
than at 11.5 days pc. More of the nervous system 
30 exhibits expression of the transcript, including a larger 
portion of midbrain, the pontine areas, the basal ganglia 
and the outermost layer of cortex. Neurons in this layer 
have undergone mitosis in the subependymal layer of the 
cortex and migrated into the mantle layer of the cerebral 
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cortex as differentiated cells (Smart et al . , J. CQmp. 
Neurol . 116:325-347, 1961). 

At 13.5 days pc, expression was seen throughout 
most of the brain. The outermost layer of the gut also 
5 appears to be expressing at this stage; these cells are 
neural crest derived and form the myenteric ganglia. At 
15.5 and 16.5 days pc, most of the neural crest derived 
neural structures have some expression. For example, the 
regions of the snout that will develop into the sensory 
10 structures at the base of the vibrissae, the pancreatic 
ganglia, the heart ganglion, the enteric nervous system, 
and the sympathetic trunk all express the transcript. 

There is no expression within the umbilicus at 
this stage. Two non-neuronal structures express this 

15 gene, the gonad and the annulus fibrosus of the 

intervertebral disk. The olfactory bulb exhibits signal 
both in the granule cells and within the tufted mitral 
cells. Within the newborn brain, the transcript was 
expressed most extensively within the differentiating 

20 regions such as the septal area, olfactory bulb, inferior 
colliculus and hippocampus. In the adult brain, the gene 
was expressed in many areas including amygdala, cortex, 
hippocampus and thalamus. In the adult cerebellum the 
transcripts were detected in the Purkinje cell layer and 

25 in the deep cerebellar nuclei. 



While the invention has been described in 
detail with reference to certain preferred embodiments 
thereof, it will be understood that modifications and 
variations are within the spirit and scope of that which 
3 0 is described and claimed. 
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Summary of Seq uences 



SEQ ID N0:1 is the nucleic acid sequence (and the 
deduced amino acid sequence) of cDNA encoding a novel 
human DS-CAMl protein of the present invention. 

SEQ ID NO: 2 is the deduced amino acid sequence of a 
human DS-CAMl protein of the present invention. 

SEQ ID NO: 3 is the cDNA probe (labeled "E51") used to 
isolate cDNA encoding human DS-CAM. 

SEQ ID NO: 4 is an Mbol linker sequence. 

SEQ ID NO: 5 is a primer labeled B9-131F used in the 
RT-PCR assay described in Example 5. 

SEQ ID NO: 6 is a primer labeled B9-131R used in the 
RT-PCR assay described in Example 5. 

SEQ ID NO: 7 is the 5' region of a partial mouse-derived 
cDNA clone encoding an invention DS-CAM protein. 

SEQ ID NO: 8 is the middle region of a partial 
mouse -derived cDNA clone encoding an invention DS-CAM 
protein . 

SEQ ID NO: 9 is the 3» region of a partial mouse-derived 
cDNA clone encoding an invention DS-CAM protein. 

SEQ ID NO: 10 is the nucleic acid sequence (and the 
deduced amino acid sequence) of cDNA encoding a novel 
human DS-CAM2 protein of the present invention. 

SEQ ID NO: 11 is the deduced amino acid sequence of a 
human DS-CAM2 protein of the present invention, which is 
a splice variant of DS-CAMl (SEQ ID NO: 2) . 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(I) APPLICANT: Korenberg, Julie R. 

(ii) TITLE OF INVENTION: NUCLEIC ACID ENCODING DS-CAM 
PROTEINS AND PRODUCTS RELATED THERETO 

(iii) NUMBER OF SEQUENCES: 11 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Campbell and Flores 

(B) STREET: 4370 La Jolla Village Drive, Suite 700 

(C) CITY: San Diego 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 92122 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

( C ) CL AS S I F I CAT I ON : 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/029,322 

(B) FILING DATE: 25-OCT-1996 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Ramos, Robert T. 

(B) REGISTRATION NUMBER: 37,915 

(C) REFERENCE /DOCKET NUMBER: P-CE 2817 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619-535-9001 

(B) TELEFAX: 619-535-8949 



(2) INFORMATION FOR SEQ ID N0:1: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6604 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 4 53. . 618 5 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

TGACTGAGGC CGGAGCACGG CAAAGATGAG CCTGCCCGCC CGCCTGCTGC CTGGATGCGG 60 

AGGGTGAGGG CTGGCGCACG GGAGGCCGCT GGCTGCGCAT TCTGGGCGCC GAGTGCCCGG 120 

GATGAGCTCA CGCCCGCGTC TGCGGCTCTC TCCACCTGCC GACCTGCCGG GGGCCCACTG 180 

AGCTGACGGC GCACCTGGGC TCCGGCCGCA GCGTGGGGCG CGGCGCCCGG GAGCAGGTGT 240 

GCAGGAGCGC AGCGCGCGGC GAGCGCAGCC CTCGCTCCGG AGCCCGGCCG CGCCGCGTGC 300 

CCGGGCGGCT AGGCAGCGGC GGCGGCGGCG GCGGGCGGCG GGCGGGCGGC GGCCCCCGGG 360 

CAGGTGCCGA GCGGCGAGCG GAGCCGGGCC GGGCGGAGCG CGGGGGGCGA GGCCGGCGCG 420 

TCGCTCGCGG GAGGCCGGGG AGCGGCAGGG GC ATG TGG ATA CTG GCT CTC TCC 473 

Met Trp lie Leu Ala Leu Ser 
1 5 

TTG TTC CAG AGC TTC GCG AAT GTT TTC AGT GAA GAC CTA CAC TCC AGC 521 

Leu Phe Gin Ser Phe Ala Asn Val Phe Ser Glu Asp Leu His Ser Ser 
10 15 20 

CTC TAC TTT GTC AAT GCA TCT CTG CAA GAG GTA GTG TTT GCC AGC ACC 569 
Leu Tyr Phe Val Asn Ala Ser Leu Gin Glu Val Val Phe Ala Ser Thr 
25 30 35 

ACG GGG ACT CTG GTG CCC TGC CCC GCA GCA GGC ATC CCT CCT GTG ACT 617 
Thr Glv Thr Leu Val Pro Cys Pro Ala Ala Gly He Pro Pro Val Thr 
40 45 50 55 

CTC AGA TGG TAC CTA GCC ACG GGC GAG GAG ATC TAC GAT GTC CCC GGG 665 
Leu Arg Trp Tyr Leu Ala Thr Gly Glu Glu He Tyr Asp Val Pro Gly 
60 65 70 



ATC CGC CAC GTC CAC CCC AAC GGC ACT CTC CAA ATT TTC CCC TTC CCT 713 
He Arg His Val His Pro Asn Gly Thr Leu Gin He Phe Pro Phe Pro 
75 80 85 

CCT TCA AGC TTC AGT ACC TTA ATC CAT GAT AAT ACT TAT TAT TGC ACA 7 61 

Pro Ser Ser Phe Ser Thr Leu He His Asp Asn Thr Tyr Tyr Cys Thr 
90 95 100 

GCT GAA AAT CCT TCA GGG AAA ATT AGA AGT CAG GAT GTC CAC ATC AAG 809 
Ala Glu Asn Pro Ser Gly Lys He Arg Ser Gin Asp Val His He Lys 
105 110 115 

GCT GTT TTA CGG GAG CCC TAT ACA GTC CGT GTG GAG GAC CAG AAA ACC 857 
Ala Val Leu Arg Glu Pro Tyr Thr Val Arg Val Glu Asp Gin Lys Thr 
120 125 130 135 

ATG AGA GGC AAT GTT GCG GTC TTC AAG TGC ATT ATC CCC TCC TCG GTG 905 
Met Arg Gly Asn Val Ala Val Phe Lys Cys He He Pro Ser Ser Val 
140 145 150 

GAG GCG TAC ATC ACT GTC GTC TCA TGG GAG AAA GAC ACT GTT TCA CTT 953 
Glu Ala Tyr He Thr Val Val Ser Trp Glu Lys Asp Thr Val Ser Leu 
155 160 165 

GTC TCA GGA TCT AGA TTT CTC ATC ACA TCC ACG GGA GCC TTG TAT ATT 1001 
Val Ser Gly Ser Arg Phe Leu He Thr Ser Thr Gly Ala Leu Tyr He 
170 175 180 
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AAA GAT GTA GAG AAT GAA GAT GGA TTG TAT AAC TAG CGC TGC ATC ACG 104 9 

Lys Asp Val Gin Asn Glu Asp Gly Leu Tyr Asn Tyr Arg Cys lie Thr 
185 190 195 

CGG CAT CGA TAG ACC GGA GAG ACG AGG CAG AGC AAC AGC GCC AGA CTT 10 97 

Arg His Arg Tyr Thr Gly Glu Thr Arg Gin Ser Asn Ser Ala Arg Leu 
200 205 210 215 

TTT GTA TCA GAG CCA GCG AAC TCA GCC CCA TCC ATA CTG GAT GGG TTT 114 5 

Phe Val Ser Asp Pro Ala Asn Ser Ala Pro Ser He Leu Asp Gly Phe 
220 225 230 

GAC CAT CGC AAA GCC ATG GCT GGG CAG CGT GTG GAG CTG CCT TGC AAA 1193 
Asp His Arg Lys Ala Met Ala Gly Gin Arg Val Glu Leu Pro Cys Lys 
235 240 245 

GCG CTG GGG CAG CCT GAG CCA GAT TAG CGC TGG CTG AAG GAC AAC ATG 1241 
Ala Leu Gly His Pro Glu Pro Asp Tyr Arg Trp Leu Lys Asp Asn Met 
250 255 260 

CCC CTG GAA CTT TCA GGG AGG TTC CAG AAG ACC GTG ACG GGG CTG CTG 1289 
Pro Leu Glu Leu Ser Gly Arg Phe Gin Lys Thr Val Thr Gly Leu Leu 
265 270 275 

ATT GAG AAC ATT CGC CCC TCG GAC TCA GGC AGC TAT GTT TGT GAA GTG 1337 
He Glu Asn He Arg Pro Ser Asp Ser Gly Ser Tyr Val Cys Glu Val 
280 285 290 295 

TCC AAC AGA TAG GGA ACT GCT AAG GTG ATA GGC CGC CTG TAG GTG AAA 1385 
Ser Asn Arg Tyr Gly Thr Ala Lys Val He Gly Arg Leu Tyr Val Lys 
300 305 310 

CAG CCA CTG AAA GCC ACC ATC AGT CCC AGG AAG GTT AAA AGC AGC GTG 14 33 

Gin Pro Leu Lys Ala Thr He Ser Pro Arg Lys Val Lys Ser Ser Val 
315 320 325 

GGT AGC CAA GTT TCC TTG TCC TGC AGC GTG ACA GGA ACT GAG GAC CAG 14 81 

Gly Ser Gin Val Ser Leu Ser Cys Ser Val Thr Gly Thr Glu Asp Gin 
330 335 340 

GAA CTG TCC TGG TAG CGC AAT GGT GAA ATC CTC AAC CCT GGA AAA AAT 152 9 

Glu Leu Ser Trp Tyr Arg Asn Gly Glu He Leu Asn Pro Gly Lys Asn 
345 350 355 

GTG AGG ATC ACA GGG ATC AAC GAC GAA AAC CTT ATA ATG GAT CAG ATG 15 77 

Val Arg He Thr Gly He Asn His Glu Asn Leu He Met Asp His Met 
360 365 370 375 

GTG AAA AGT GAC GGG GGC GCA TAG CAG TGC TTT GTG CGC AAG GAC AAG 1625 
Val Lys Ser Asp Gly Gly Ala Tyr Gin Cys Phe Val Arg Lys Asp Lys 
380 385 390 

CTG TCC GCT CAA GAC TAT GTG CAG GTG GTC CTT GAA GAT GGA ACT CCC 167 3 

Leu Ser Ala Gin Asp Tyr Val Gin Val Val Leu Glu Asp Gly Thr Pro 
395 400 405 

AAA ATT ATT TCT GCC TTT AGT GAA AAG GTG GTG AGT CCA GCA GAG GCG 1721 
Lys He He Ser Ala Phe Ser Glu Lys Val Val Ser Pro Ala Glu Pro 
410 415 420 

GTT TCC CTT ATG TGC AAC GTG AAG GGA ACA CCT TTG CCC ACG ATC ACG 17 69 

Val Ser Leu Met Cys Asn Val Lys Gly Thr Pro Leu Pro Thr He Thr 
425 430 435 
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TGG ACC CTG GAG GAT GAG CCG ATT CTC AAG GGT GGC AGT CAC CGC ATC 1817 
Trp Thr Leu Asp Asp Asp Pro lie Leu Lys Gly Gly Ser His Arg lie 
440 445 450 455 

AGC GAG ATG ATC ACG TCG GAG GGG AAC GTG GTC AGC TAG CTG AAC ATC 18 65 

Ser Gin Met lie Thr Ser Glu Gly Asn Val Val Ser Tyr Leu Asn lie 
460 465 470 

TCC AGC TCC CAG GTC CGG GAG GGG GGA GTC TAG CGC TGC ACT GGC AAC 1913 
Ser Ser Ser Gin Val Arg Asp Gly Gly Val Tyr Arg Cys Thr Ala Asn 
475 480 485 

AAC TCG GGG GGA GTC GTC CTG TAG CAG GCT CGA ATA AAC GTA AGA GGG 1961 
Asn Ser Ala Gly Val Val Leu Tyr Gin Ala Arg lie Asn Val Arg Gly 
490 495 500 

CCT GCA AGC ATT CGA CCA ATG AAA AAC ATC ACA GGA ATA GGA GGA CGG 2009 
Pro Ala Ser lie Arg Pro Met Lys Asn lie Thr Ala lie Ala Gly Arg 
505 510 515 

GAC ACA TAG ATT CAC TGT CGT GTG ATT GGC TAT CCG TAT TAG TCC ATT 2 057 

Asp Thr Tyr lie His Cys Arg Val lie Gly Tyr Pro Tyr Tyr Ser lie 
520 525 530 535 

AAA TGG TAG AAG AAC TCT AAC CTG CTT CCT TTC AAC CAC CGC CAA GTG 2105 
Lys Trp Tyr Lys Asn Ser Asn Leu Leu Pro Phe Asn His Arg Gin Val 
540 545 550 

GCA TTT GAG AAC AAT GGA ACT CTT AAA CTT TCA GAT GTG CAA AAG GAA 2153 
Ala Phe Glu Asn Asn Gly Thr Leu Lys Leu Ser Asp Val Gin Lys Glu 
555 560 565 

GTG GAC GAG GGG GAG TAG ACG TGC AAC GTG TTG GTT CAA CCA CAA CTC 22 01 

Val Asp Glu Gly Glu Tyr Thr Cys Asn Val Leu Val Gin Pro Gin Leu 
570 575 580 

TCC ACC AGC CAG AGC GTC CAC GTG ACC GTG AAA GTT CCG CCT TTC ATA 224 9 

Ser Thr Ser Gin Ser Val His Val Thr Val Lys Val Pro Pro Phe lie 
585 590 595 

CAA CCG TTT GAG TTT CCA AGA TTC TCC ATT GGG CAG CGG GTC TTC ATC 2297 
Gin Pro Phe Glu Phe Pro Arg Phe Ser lie Gly Gin Arg Val Phe lie 
600 605 610 615 

CCG TGT GTT GTG GTC TCA GGG GAC TTA CCG ATC ACG ATC ACC TGG CAG 234 5 

Pro Cys Val Val Val Ser Gly Asp Leu Pro lie Thr lie Thr Trp Gin 
620 625 630 

AAG GAT GGC CGG CCA ATC CCT GGG AGC CTT GGG GTG ACC ATT GAC AAT 2393 
Lys Asp Gly Arg Pro lie Pro Gly Ser Leu Gly Val Thr lie Asp Asn 
635 640 645 

ATT GAC TTC ACG AGC TCC TTG AGG ATT TCC AAT CTC TCG CTC ATG CAC 2441 
lie Asp Phe Thr Ser Ser Leu Arg He Ser Asn Leu Ser Leu Met His 
650 655 660 

AAT GGG AAT TAC ACC TGC ATA GCC CGG AAT GAG GCC GCC GCT GTG GAG 24 89 

Asn Gly Asn Tyr Thr Cys He Ala Arg Asn Glu Ala Ala Ala Val Glu 
665 670 675 

CAC CAA AGC CAG TTG ATT GTC AGA GTT CCT CGC AAG TTT GTG GTT CAG 2537 
His Gin Ser Gin Leu He Val Arg Val Pro Pro Lys Phe Val Val Gin 
680 685 690 695 
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CCA CGG GAC CAG GAC GGG ATT TAT GGC AAA GCA GTC ATC CTC AAT TGT 2585 
Pro Arg Asp Gin Asp Gly lie Tyr Gly Lys Ala Val He Leu Asn Cys 
700 705 710 

TCT GCT GAG GGT TAG CCT GTA CCT ACC ATC GTG TGG AAA TTC TCT AAA 2633 
Ser Ala Glu Gly Tyr Pro Val Pro Thr He Val Trp Lys Phe Ser Lys 
715 720 725 

GGT GCT GGG GTT CCC CAG TTC CAG CCA ATT GCC CTA AAT GGC CGA ATC 2 681 

Gly Ala Gly Val Pro Gin Phe Gin Pro He Ala Leu Asn Gly Arg He 
730 735 740 

CAA GTT CTC AGC AAT GGG TCG TTG CTG ATC AAG CAT GTC GTG GAG GAA 272 9 

Gin Val Leu Ser Asn Gly Ser Leu Leu He Lys His Val Val Glu Glu 
745 750 755 

GAC AGT GGC TAG TAC CTC TGC AAG GTC AGC AAC GAT GTG GGC GCA GAC 277 7 

Asp Ser Gly Tyr Tyr Leu Cys Lys Val Ser Asn Asp Val Gly Ala Asp 
760 765 770 775 

GTC AGC AAG TCC ATG TAC CTC ACG GTT AAA ATT CCT GCG ATG ATA ACA 2825 
Val Ser Lys Ser Met Tyr Leu Thr Val Lys He Pro Ala Met He Thr 
780 785 790 

TCC TAT CCA AAT ACT ACC CTG GCC ACG CAG GGG CAG AAA AAG GAG ATG 2873 
Ser Tyr Pro Asn Thr Thr Leu Ala Thr Gin Gly Gin Lys Lys Glu Met 
795 800 805 

AGC TGC ACG GCG CAT GGT GAG AAG CCC ATT ATA GTC CGC TGG GAG AAG 2921 
Ser Cys Thr Ala His Gly Glu Lys Pro He He Val Arg Trp Glu Lys 
810 815 820 

GAG GAC CGA ATC ATT AAC CCT GAG ATG GCC CGT TAT CTT GTG TCC ACC 2969 
Glu Asp Arg He He Asn Pro Glu Met Ala Arg Tyr Leu Val Ser Thr 
825 830 835 

AAG GAG GTG GGA GAA GAG GTG ATT TCT ACT CTG CAG ATT TTG CCA ACT 3017 
Lys Glu Val Gly Glu Glu Val He Ser Thr Leu Gin He Leu Pro Thr 
840 845 850 855 

GTG AGA GAA GAT TCT GGT TTC TTT TCC TGC CAT GCT ATT AAT TCT TAT 3065 
Val Arg Glu Asp Ser Gly Phe Phe Ser Cys His Ala He Asn Ser Tyr 
860 865 870 

GGG GAG GAC CGT GGA ATA ATT CAG CTC ACA GTG CAA GAG CCC CCA GAC 3113 
Gly Glu Asp Arg Gly He He Gin Leu Thr Val Gin Glu Pro Pro Asp 
875 880 885 

CCT CCC GAA ATT GAG ATC AAA GAT GTC AAA GCA CGC ACA ATT ACG CTC 3161 
Pro Pro Glu He Glu He Lys Asp Val Lys Ala Arg Thr He Thr Leu 
890 895 900 

AGG TGG ACC ATG GGG TTT GAT GGA AAC AGT CCC ATC ACA GGC TAC GAT 3209 
Arg Trp Thr Met Gly Phe Asp Gly Asn Ser Pro He Thr Gly Tyr Asp 
905 910 915 

ATT GAA TGC AAA AAT AAA TCA GAC TCC TGG GAT TCT GCT CAG AGA ACC 3257 
He Glu Cys Lys Asn Lys Ser Asp Ser Trp Asp Ser Ala Gin Arg Thr 
920 925 930 935 

AAA GAT GTT TCC CCT CAG CTG AAC TCG GCC ACC ATC ATT GAT ATC CAC 3305 
Lys Asp Val Ser Pro Gin Leu Asn Ser Ala Thr He He Asp He His 
940 945 950 
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OCT TCC TCC ACC TAG AGC ATC CGC ATG TAG GCC AAG AAC CGG ATT GGC 3353 
Pro Ser Ser Thr Tyr Ser He Arg Met Tyr Ala Lys Asn Arg He Gly 
955 960 965 

AAG AGC GAG CCC AGC AAC GAG CTC ACC ATC ACG GCG GAC GAG GCA GCT 34 01 

Lvs Ser Glu Pro Ser Asn Glu Leu Thr He Thr Ala Asp Glu Ala Ala 
970 975 980 

CCT GAT GGT CCA CCT CAG GAA GTT CAC CTG GAG CCT ATA TCA TCT CAG 34 4 9 

Pro Asp Gly Pro Pro Gin Glu Val His Leu Glu Pro He Ser Ser Gin 
985 990 995 

AGC ATC AGG GTC ACA TGG AAG GCT CCC AAG AAA CAT TTG CAA AAT GGG 34 97 

Ser He Arg Val Thr Trp Lys Ala Pro Lys Lys His Leu Gin Asn Gly 
1000 1005 1010 1015 

ATT ATC CGT GGC TAC CAA ATA GGT TAC CGA GAG TAC AGC ACT GGG GGT 3545 
He He Arg Gly Tyr Gin He Gly Tyr Arg Glu Tyr Ser Thr Gly Gly 
1020 1025 1030 

AAC TTC CAA TTC AAC ATT ATC AGT GTC GAC ACC AGC GGG GAC AGT GAG 35 93 

Asn Phe Gin Phe Asn He He Ser Val Asp Thr Ser Gly Asp Ser Glu 
1035 1040 1045 

GTT TAC ACC CTG GAC AAC CTG AAT AAG TTC ACT CAG TAC GGC CTG GTG 3641 
Val Tyr Thr Leu Asp Asn Leu Asn Lys Phe Thr Gin Tyr Gly Leu Val 
1050 1055 1060 

GTG CAG GCC TGT AAC CGG GCC GGC ACG GGG CCT TCT TCT CAG GAA ATC 368 9 

Val Gin Ala Cys Asn Arg Ala Gly Thr Gly Pro Ser Ser Gin Glu He 
1065 1070 1075 

ATC ACC ACC ACT CTC GAG GAT GTG CCC AGT TAC CCC CCC GAA AAT GTC 3737 
He Thr Thr Thr Leu Glu Asp Val Pro Ser Tyr Pro Pro Glu Asn Val 
1080 1085 1090 1095 

CAA GCC ATA GCA ACA TCA CCA GAA AGC ATA TCA ATA TCC TGG TCC ACA 3785 
Gin Ala He Ala Thr Ser Pro Glu Ser He Ser He Ser Trp Ser Thr 
1100 1105 1110 

CTT TCC AAG GAA GCC TTG AAT GGA ATT CTC CAG GGG TTC AGA GTC ATT 3833 
Leu Ser Lys Glu Ala Leu Asn Gly He Leu Gin Gly Phe Arg Val He 
1115 1120 1125 

TAC TGG GCC AAC CTC ATG GAC GGA GAG CTG GGT GAG ATT AAA AAC ATC 3881 
Tvr Trp Ala Asn Leu Met Asp Gly Glu Leu Gly Glu He Lys Asn He 
^ 1130 1135 1140 

ACC ACC ACA CAG CCT TCA CTG GAG CTG GAC GGG CTG GAA AAG TAC ACC 3929 
Thr Thr Thr Gin Pro Ser Leu Glu Leu Asp Gly Leu Glu Lys Tyr Thr 
1145 1150 1155 

AAC TAC AGC ATC CAG GTG CTG GCC TTC ACC CGC GCA GGA GAC GGG GTC 3977 
Asn Tvr Ser He Gin Val Leu Ala Phe Thr Arg Ala Gly Asp Gly Val 
1160 1165 1170 1175 

AGG AGT GAG CAG ATC TTC ACC CGG ACC AAA GAG GAT GTT CCA GGT CCT 4 025 

Arg Ser Glu Gin He Phe Thr Arg Thr Lys Glu Asp Val Pro Gly Pro 
1180 1185 1190 

CCC GCG GGT GTG AAG GCA GCG GCG GCC TCA GCC TCC ATG GTC TTT GTG 4 073 

Pro Ala Gly Val Lys Ala Ala Ala Ala Ser Ala Ser Met Val Phe Val 
1195 1200 1205 
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TCC TGG CTT CCC CCT CTC AAG CTG AAC GGC ATC ATC CGA AAG TAG ACT 4121 
Ser Trp Leu Pro Pro Leu Lys Leu Asn Gly lie lie Arg Lys Tyr Thr 
1210 1215 1220 

GTA TTC TGG TCC CAC CCC TAT CCC ACA GTG ATC AGC GAG TTT GAG GCC 4169 
Val Phe Cys Ser His Pro Tyr Pro Thr Val He Ser Glu Phe Glu Ala 
1225 1230 1235 

TCT CCC GAG TGG TTT TCC TAG AGA ATT CCC AAC CTG AGT AGG AAT CGT 4217 
Ser Pro Asp Ser Phe Ser Tyr Arg He Pro Asn Leu Ser Arg Asn Arg 
1240 1245 1250 1255 

CAG TAG AGC GTC TGG GTG GTG GCT GTT ACT TCA GCC GGA AGA GGC AAC 4265 
Gin Tyr Ser Val Trp Val Val Ala Val Thr Ser Ala Gly Arg Gly Asn 
1260 1265 1270 

AGC AGT GAA ATC ATC ACA GTC GAG CCA CTA GCA AAA GCT CCT GCA CGA 4313 
Ser Ser Glu He He Thr Val Glu Pro Leu Ala Lys Ala Pro Ala Arg 
1275 1280 1285 

ATC CTG ACC TTC AGT GGG ACA GTG ACT ACT CCA TGG ATG AAA GAC ATT 4 361 

He Leu Thr Phe Ser Gly Thr Val Thr Thr Pro Trp Met Lys Asp He 
1290 1295 1300 

GTC TTG CCT TGT AAG GCT GTT GGG GAC CCT TCT CCT GCA GTC AAA TGG 4 4 09 

Val Leu Pro Cys Lys Ala Val Gly Asp Pro Ser Pro Ala Val Lys Trp 
1305 1310 1315 

ATG AAA GAC AGT AAC GGG ACA CCC AGT CTA GTA ACG ATT GAT GGG CGG 4 457 

Met Lys Asp Ser Asn Gly Thr Pro Ser Leu Val Thr He Asp Gly Arg 
1320 1325 1330 1335 

AGG AGC ATC TTT AGC AAC GGA AGC TTC ATT ATT CGC ACG GTG AAA GCA 4 505 

Arg Ser He Phe Ser Asn Gly Ser Phe He He Arg Thr Val Lys Ala 
1340 1345 1350 

GAA GAC TCC GGC TAT TAC AGC TGC ATT GCC AAT AAC AAC TGG GGA TCT . 4 553 

Glu ASP Ser Gly Tyr Tyr Ser Cys He Ala Asn Asn Asn Trp Gly Ser 
1355 1360 1365 

GAT GAA ATT ATT TTA AAC TTA CAA GTA CAA GTT CCA CCA GAT CAG CCT 4 601 

Asp Glu He He Leu Asn Leu Gin Val Gin Val Pro Pro Asp Gin Pro 
1370 1375 1380 

CGG CTT ACA GTC TCC AAG ACC ACG TCT TCC TCC ATC ACC CTT TCT TGG 4 64 9 

Arg Leu Thr Val Ser Lys Thr Thr Ser Ser Ser He Thr Leu Ser Trp 
1385 1390 1395 

CTC CCT GGA GAC AAC GGG GGC AGC TCT ATC AGA GGA TAC ATA CTG CAG 4 697 

Leu Pro Gly Asp Asn Gly Gly Ser Ser He Arg Gly Tyr He Leu Gin 
1400 1405 1410 1415 

TAC TCC GAG GAC AAT AGT GAG CAG TGG GGG AGT TTT CCA ATC AGC CCC 4745 
Tvr Ser Glu Asp Asn Ser Glu Gin Trp Gly Ser Phe Pro He Ser Pro 
1420 1425 1430 

AGC GAA CGT TCC TAT CGC TTG GAA AAT CTC AAA TGT GGG ACT TGG TAT 4793 
Ser Glu Arg Ser Tyr Arg Leu Glu Asn Leu Lys Cys Gly Thr Trp Tyr 
1435 1440 1445 

AAG TTC ACA CTG ACA GCC CAA AAT GGA GTG GGC CCA GGG CGC ATA AGT 4 841 

Lys Phe Thr Leu Thr Ala Gin Asn Gly Val Gly Pro Gly Arg He Ser 
1450 1455 1460 
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GAA ATC ATA GAA GCA AAG ACC TTA GGA PJ^ GAG CCC CAG TTC TCA AAG 4 889 

Glu lie lie Glu Ala Lys Thr Leu Gly Lys Glu Pro Gin Phe Ser Lys 
1465 1470 1475 

GAG CAG GAG CTG TTT GCC AGC ATC AAC ACC ACA CGC GTG AGG CTG AAC 4 937 

Glu Gin Glu Leu Phe Ala Ser lie Asn Thr Thr Arg Val Arg Leu Asn 
1480 1485 1490 1495 

CTC ATT GGC TGG AAT GAT GGC GGC TGC CCC ATC ACC TCC TTC ACA CTA 4 985 

Leu lie Gly Trp Asn Asp Gly Gly Cys Pro lie Thr Ser Phe Thr Leu 
1500 1505 1510 

GAG TAC AGG CCC TTT GGG ACC ACA GTT TGG ACC ACA GOT CAG AGG ACC 5033 
Glu Tyr Arg Pro Phe Gly Thr Thr Val Trp Thr Thr Ala Gin Arg Thr 
1515 1520 1525 

TCT CTC TCC AAG TCC TAC ATC CTG TAT GAC CTG CAG GAA GCC ACC TGG 5081 
Ser Leu Ser Lys Ser Tyr He Leu Tyr Asp Leu Gin GLu Ala Thr Trp 
1530 1535 1540 

TAT GAG CTG CAG ATG CGG GTG TGC AAC AGT GCG GGC TGC GCG GAG AAG 512 9 

Tyr Glu Leu Gin Met Arg Val Cys Asn Ser Ala Gly Cys Ala Glu Lys 
1545 1550 1555 

CAG GCC AAC TTC GCT ACG CTG AAC TAC GAT GGC AGT ACA ATT CCT CCA 5177 
Gin Ala Asn Phe Ala Thr Leu Asn Tyr Asp Gly Ser Thr He Pro Pro 
1560 1565 1570 1575 

CTC ATT AAG TCA GTT GTG CAA AAC GAA GAA GGG CTG ACG ACC AAC GAG 5225 
Leu He Lys Ser Val Val Gin Asn Glu Glu Gly Leu Thr Thr Asn Glu 
1580 1585 1590 

GGG CTC AAG ATG CTG GTG ACC ATC TCC TGT ATC CTG GTG GGG GTC TTG 527 3 

Gly Leu Lys Met Leu Val Thr He Ser Cys He Leu Val Gly Val Leu 
1595 1600 1605 

CTG CTG TTT GTG CTC CTG CTG GTT GTG CGG AGG AGG CGG CGG GAG CAG 5321 
Leu Leu Phe Val Leu Leu Leu Val Val Arg Arg Arg Arg Arg Glu Gin 
1610 1615 1620 

AGG CTA AAG AGG CTG CGA GAT GCA AAG AGT TTA GCT GAA ATG CTC ATG 53 69 

Arg Leu Lys Arg Leu Arg Asp Ala Lys Ser Leu Ala Glu Met Leu Met 
1625 1630 1635 

AGT AAG AAT ACC CGG ACT TCA GAT ACG TTA AGC AAG CAA CAG CAG ACC 5 417 

Ser Lys Asn Thr Arg Thr Ser Asp Thr Leu Ser Lys Gin Gin Gin Thr 
1640 1645 1650 1655 

CTG CGA ATG CAC ATC GAC ATA CCC AGG GCT CAG CTT TTG ATT GAA GAG 54 65 

Leu Arg Met His lie Asp He Pro Arg Ala Gin Leu Leu He Glu Glu 
1660 1665 1670 

AGA GAC ACG ATG GAG ACC ATT GAT GAT CGC TCC ACG GTT CTG TTG ACG 5 513 

Arg Asp Thr Met Glu Thr He Asp Asp Arg Ser Thr Val Leu Leu Thr 
1675 1680 1685 

GAT GCT GAC TTT GGA GAG GCA GCT AAG CAG AAG TCC CTG ACG GTC ACT 55 61 

Asp Ala Asp Phe Gly Glu Ala Ala Lys Gin Lys Ser Leu Thr Val Thr 
1690 1695 1700 

CAC ACG GTC CAT TAC CAA TCG GTG TCT CAG GCC ACT GGG CCC TTA GTG 560 9 

His Thr Val His Tyr Gin Ser Val Ser Gin Ala Thr Gly Pro Leu Val 
1705 1710 1715 
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GAT GTT TCA GAC GOT CGG CCG GGA ACG AAT CCC ACC ACC AGG AGG AAT 5 657 

ASP Val Ser Asp Ala Arg Pro Gly Thr Asn Pro Thr Thr Arg Arg Asn 
1720 1725 1730 l'-^^ 

GCC AAG GCT GGG CCC ACA GCG AGA AAC CGG TAT GCC AGC CAG TGG ACC 5705 
Ala Lys Ala Gly Pro Thr Ala Arg Asn Arg Tyr Ala Ser Gin Trp Thr 
1740 1745 1750 

CTC AAC CGA CCC CAC CCC ACC ATC TCA GCA CAC ACC CTC ACC ACA GAC 5753 
Leu Asn Arg Pro His Pro Thr lie Ser Ala His Thr Leu Thr Thr Asp 
1755 1760 1765 

TGG AGG CTG CCA ACA CCC AGG GCT GCA GGA TCA GTA GAC AAA GAG AGC 5801 
Trp Arg Leu Pro Thr Pro Arg Ala Ala Gly Ser Val Asp Lys Glu Ser 
1770 1775 1780 

GAC AGT TAC AGC GTC AGC CCC TCG CAA GAC ACA GAT CGA GCA AGA AGC 584 9 

Asp Ser Tyr Ser Val Ser Pro Ser Gin Asp Thr Asp Arg Ala Arg Ser 
1785 1790 1795 

AGC ATG GTC TCC ACA GAA AGT GCC TCC TCC ACT TAC GAA GAA CTG GCC 5 8 97 

Ser Met Val Ser Thr Glu Ser Ala Ser Ser Thr Tyr Glu Glu Leu Ala 
1800 1805 1810 1815 

AGG GCC TAC GAA CAC GCC AAG ATG GAA GAG CAA CTG AGG CAC GCC AAG 5 945 

Ara Ala Tvr Glu His Ala Lys Met Glu Glu Gin Leu Arg His Ala Lys 
^ 1820 1825 1830 

TTC ACC ATC ACG GAG TGC TTC ATA TCA GAC ACG TCA TCG GAG CAG TTG 5 993 

Phe Thr He Thr Glu Cys Phe He Ser Asp Thr Ser Ser Glu Gin Leu 
1835 1840 1845 

ACG GCA GGG ACA AAT GAG TAC ACG GAC AGT CTG ACC TCC AGC ACC CCT 6041 
Thr Ala Gly Thr Asn Glu Tyr Thr Asp Ser Leu Thr Ser Ser Thr Pro 
1850 1855 I860 

TCC GAA TCG GGA ATC TGC AGG TTC ACT GCA TCT CCC CCC AAA CCT CAG 608 9 

Ser Glu Ser Gly He Cys Arg Phe Thr Ala Ser Pro Pro Lys Pro Gin 
1865 1870 1875 

GAT GGA GGA AGA GTA ATG AAT ATG GCA GTT CCA AAG GCA ATC GGC CAG 6137 
ASD Glv Glv Arg Val Met Asn Met Ala Val Pro Lys Ala He Gly Gin 
1880 ^ ^ 1885 1890 1895 

GTG ACC TCA TAC ATT TGC CTC CAT ACC TTA GAA TGG ACT TTT TGT TAAACCGAGG 
Val Thr Ser Tyr He Cys Leu His Thr Leu Glu Trp Thr Phe Cys 
1900 1905 1910 

TGGTCCAGGC ACCAGCAGGG ACCTGAGCTT AGGACAAGCA TGCTTGGAAC CTCAGAAAAG 6252 

CCGGACCCTG AAGCGCCCCA CGGTCCTGGA GCCCATCCCG ATGGAAGCCG CCTCCTCCGC 6312 

CTCCTCCACG AGAGAAGGAC AGTCGTGGCA GCCGGGGGCC GTGGCCACAT TACCTCAGCG 6372 

GGAGGGAGCA GAGCTGGGAC AGGCAGCTAA AATGAGCAGC TCCCAAGAAT CACTGCTCGA 64 32 

CTCCCGGGGC CATTTGAAAG GAAACAATCC TTACGCAAAA TCTTACACCC TGGTATAACA 64 92 

GACAGCATGA CTGGACAGCG GTTGTAAATA CAATTCAAAC AATTCAATCA AAGCTACCTT 6552 

TTTTTTACGG AATTCCTUITA TTTATAATTA AAGAAAATTG CCAAAATATA TT 6604 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1910 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Trp He Leu Ala Leu Ser Leu Phe Gin Ser Phe Ala Asn Val Phe 
15 10 15 

ser Glu Asp Leu His Ser Ser Leu Tyr Phe Val Asn Ala Ser Leu Gin 
20 25 30 

Glu Val Val Phe Ala Ser Thr Thr Gly Thr Leu Val Pro Cys Pro Ala 
35 40 45 

Ala Glv He Pro Pro Val Thr Leu Arg Trp Tyr Leu Ala Thr Gly Glu 
50 55 60 

Glu He Tyr Asp Val Pro Gly He Arg His Val His Pro Asn Gly Thr 
65 70 75 80 

Leu Gin He Phe Pro Phe Pro Pro Ser Ser Phe Ser Thr Leu He His 
85 90 95 

Asp Asn Thr Tyr Tyr Cys Thr Ala Glu Asn Pro Ser Gly Lys He Arg 
100 105 110 

Ser Gin Asp Val His He Lys Ala Val Leu Arg Glu Pro Tyr Thr Val 
115 120 125 

Arq Val Glu Asp Gin Lys Thr Met Arg Gly Asn Val Ala Val Phe Lys 
130 135 140 

Cvs He He Pro Ser Ser Val Glu Ala Tyr He Thr Val Val Ser Trp 
145 150 155 160 

Glu Lys Asp Thr Val Ser Leu Val Ser Gly Ser Arg Phe Leu He Thr 
165 170 175 

Ser Thr Gly Ala Leu Tyr He Lys Asp Val Gin Asn Glu Asp Gly Leu 
180 185 190 

Tvr Asn Tyr Arg Cys He Thr Arg His Arg Tyr Thr Gly Glu Thr Arg 
195 200 205 

Gin Ser Asn Ser Ala Arg Leu Phe Val Ser Asp Pro Ala Asn Ser Ala 
210 215 220 

Pro Ser He Leu Asp Gly Phe Asp His Arg Lys Ala Met Ala Gly Gin 
225 230 235 240 

Arg Val Glu Leu Pro Cys Lys Ala Leu Gly His Pro Glu Pro Asp Tyr 
245 250 255 

Ara Trp Leu Lys Asp Asn Met Pro Leu Glu Leu Ser Gly Arg Phe Gin 
260 265 270 



Lys Thr Val Thr 
275 

Gly Ser Tyr Val 
290 

lie Gly Arg Leu 
305 

Arg Lys Val Lys 



Val Thr Gly Thr 
340 



lie Leu Asn Pro 

355 



Asn Leu lie Met 
370 

Cys Phe Val Arg 
385 

Val Leu Glu Asp 



Val Val Ser Pro 
420 

Thr Pro Leu Pro 
435 

Lys Gly Gly Ser 
450 

Val Val Ser Tyr 
465 

Val Tyr Arg Cys 



Ala Arg lie Asn 
500 

lie Thr Ala lie 
515 



Gly Tyr Pro Tyr 
530 

Pro Phe Asn His 
545 

Leu Ser Asp Val 



Val Leu Val Gin 
580 



Val Lys Val Pro 
595 



Gly Leu Leu lie 
280 

Cys Glu Val Ser 
295 

Tyr Val Lys Gin 
310 

Ser Ser Val Gly 
325 

Glu Asp Gin Glu 



Gly Lys Asn Val 
360 



Asp His Met Val 
375 

Lys Asp Lys Leu 
390 

Gly Thr Pro Lys 
405 

Ala Glu Pro Val 



Thr lie Thr Trp 
440 



His Arg lie Ser 
455 

Leu Asn lie Ser 
470 

Thr Ala Asn Asn 
485 

Val Arg Gly Pro 



Ala Gly Arg Asp 
520 



Tyr Ser lie Lys 
535 

Arg Gin Val Ala 
550 

Gin Lys Glu Val 
565 

Pro Gin Leu Ser 



Pro Phe lie Gin 
600 
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Glu Asn lie Arg 



Asn Arg Tyr Gly 
300 

Pro Leu Lys Ala 
315 

Ser Gin Val Ser 
330 

Leu Ser Trp Tyr 
345 

Arg lie Thr Gly 



Lys Ser Asp Gly 
380 



Ser Ala Gin Asp 
395 

lie lie Ser Ala 
410 

Ser Leu Met Cys 
425 

Thr Leu Asp Asp 



Gin Met lie Thr 
460 



Ser Ser Gin Val 
475 

Ser Ala Gly Val 
490 

Ala Ser lie Arg 
505 

Thr Tyr lie His 



Trp Tyr Lys Asn 
540 

Phe Glu Asn Asn 
555 

Asp Glu Gly Glu 
570 

Thr Ser Gin Ser 
585 

Pro Phe Glu Phe 



Pro Ser Asp Ser 
285 

Thr Ala Lys Val 



Thr lie Ser Pro 
320 



Leu Ser Cys Ser 
335 

Arg Asn Gly Glu 
350 

lie Asn His Glu 
365 

Gly Ala Tyr Gin 



Tyr Val Gin Val 
400 

Phe Ser Glu Lys 
415 

Asn Val Lys Gly 
430 

Asp Pro lie Leu 
445 

Ser Glu Gly Asn 



Arg Asp Gly Gly 
480 

Val Leu Tyr Gin 
495 

Pro Met Lys Asn 
510 

Cys Arg Val lie 
525 

Ser Asn Leu Leu 



Gly Thr Leu Lys 
560 

Tyr Thr Cys Asn 
575 

Val His Val Thr 
590 

Pro Arg Phe Ser 
605 
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lie Gly Gin Arg 
610 

Pro lie Thr lie 
625 

Leu Gly Val Thr 



Ser Asn Leu Ser 
660 

Asn Glu Ala Ala 
675 

Pro Pro Lys Phe 
690 

Lys Ala Val lie 
705 

lie Val Trp Lys 



lie Ala Leu Asn 
740 

lie Lys His Val 
755 

Ser Asn Asp Val 
770 

Lys lie Pro Ala 
785 

Gin Gly Gin Lys 



lie lie Val Arg 
820 

Ala Arg Tyr Leu 
835 

Thr Leu Gin lie 
850 

Cys His Ala lie 
8 65 

Thr Val Gin Glu 



Lys Ala Arg Thr 
900 

Ser Pro lie Thr 
915 



Val Phe lie Pro 
615 

Thr Trp Gin Lys 
630 

lie Asp Asn lie 
645 

Leu Met His Asn 



Ala Val Glu His 
680 

Val Val Gin Pro 
695 

Leu Asn Cys Ser 
710 

Phe Ser Lys Gly 
725 

Gly Arg lie Gin 



Val Glu Glu Asp 
760 

Gly Ala Asp Val 
775 

Met lie Thr Ser 
790 

Lys Glu Met Ser 
805 

Trp Glu Lys Glu 



Val Ser Thr Lys 
840 

Leu Pro Thr Val 
855 

Asn Ser Tyr Gly 
870 



Pro Pro Asp Pro 
885 

lie Thr Leu Arg 



Gly Tyr Asp lie 
920 



Cys Val Val Val 
620 

Asp Gly Arg Pro 
635 

Asp Phe Thr Ser 
650 

Gly Asn Tyr Thr 
665 

Gin Ser Gin Leu 



Arg Asp Gin Asp 
700 

Ala Glu Gly Tyr 
715 

Ala Gly Val Pro 
730 

Val Leu Ser Asn 
745 

Ser Gly Tyr Tyr 



Ser Lys Ser Met 
780 

Tyr Pro Asn Thr 
795 

Cys Thr Ala His 
810 



Asp Arg lie lie 
825 

Glu Val Gly Glu 



Arg Glu Asp Ser 
860 

Glu Asp Arg Gly 
875 

Pro Glu lie Glu 
890 

Trp Thr Met Gly 
905 

Glu Cys Lys Asn 



Ser Gly Asp Leu 



lie Pro Gly Ser 
640 

Ser Leu Arg lie 
655 

Cys lie Ala Arg 
670 

lie Val Arg Val 
685 

Gly lie Tyr Gly 



Pro Val Pro Thr 
720 

Gin Phe Gin Pro 
735 

Gly Ser Leu Leu 

750 

Leu Cys Lys Val 
765 

Tyr Leu Thr Val 



Thr Leu Ala Thr 
800 

Gly Glu Lys Pro 
815 

Asn Pro Glu Met 
830 

Glu Val lie Ser 
845 

Gly Phe Phe Ser 



lie lie Gin Leu 
880 

He Lys Asp Val 
895 

Phe Asp Gly Asn 
910 

Lys Ser Asp Ser 
925 



Trp Asp Ser Ala Gin Arg Thr Lys Asp Val Ser Pro Gin Leu Asn Ser 
930 935 940 
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Ala Thr lie lie Asp lie His Pro Ser Ser Thr Tyr Ser lie Arg Met 
945 950 955 960 

Tyr Ala Lys Asn Arg lie Gly Lys Ser Glu Pro Ser Asn Glu Leu Thr 
965 970 975 

lie Thr Ala Asp Glu Ala Ala Pro Asp Gly Pro Pro Gin Glu Val His 
980 985 990 

Leu Glu Pro He Ser Ser Gin Ser He Arg Val Thr Trp Lys Ala Pro 
995 1000 1005 

Lys Lys His Leu Gin Asn Gly He He Arg Gly Tyr Gin He Gly Tyr 
1010 1015 1020 

Arg Glu Tyr Ser Thr Gly Gly Asn Phe Gin Phe Asn He He Ser Val 
1025 1030 1035 1040 

Asp Thr Ser Gly Asp Ser Glu Val Tyr Thr Leu Asp Asn Leu Asn Lys 
1045 1050 1055 

Phe Thr Gin Tyr Gly Leu Val Val Gin Ala Cys Asn Arg Ala Gly Thr 
1060 1065 1070 

Gly Pro Ser Ser Gin Glu He He Thr Thr Thr Leu Glu Asp Val Pro 
1075 1080 1085 

Ser Tyr Pro Pro Glu Asn Val Gin Ala He Ala Thr Ser Pro Glu Ser 
1090 1095 1100 

He Ser He Ser Trp Ser Thr Leu Ser Lys Glu Ala Leu Asn Gly He 
1105 1110 1115 1120 

Leu Gin Gly Phe Arg Val He Tyr Trp Ala Asn Leu Met Asp Gly Glu 
1125 1130 1135 

Leu Gly Glu He Lys Asn He Thr Thr Thr Gin Pro Ser Leu Glu Leu 
1140 1145 1150 

Asp Gly Leu Glu Lys Tyr Thr Asn Tyr Ser He Gin Val Leu Ala Phe 
1155 1160 1165 

Thr Arg Ala Gly Asp Gly Val Arg Ser Glu Gin He Phe Thr Arg Thr 
1170 1175 1180 

Lys Glu Asp Val Pro Gly Pro Pro Ala Gly Val Lys Ala Ala Ala Ala 
1185 1190 1195 1200 

Ser Ala Ser Met Val Phe Val Ser Trp Leu Pro Pro Leu Lys Leu Asn 
1205 1210 1215 

Gly He He Arg Lys Tyr Thr Val Phe Cys Ser His Pro Tyr Pro Thr 
1220 1225 1230 

Val He Ser Glu Phe Glu Ala Ser Pro Asp Ser Phe Ser Tyr Arg He 
1235 1240 1245 

Pro Asn Leu Ser Arg Asn Arg Gin Tyr Ser Val Trp Val Val Ala Val 
1250 1255 1260 

Thr Ser Ala Gly Arg Gly Asn Ser Ser Glu He He Thr Val Glu Pro 
1265 1270 1275 1280 
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Leu Ala Lys Ala Pro Ala Arg He Leu Thr Phe Ser Gly Thr Val Thr 
1285 1290 1295 

Thr Pro Trp Met Lys Asp He Val Leu Pro Cys Lys Ala Val Gly Asp 
1300 1305 1310 

Pro Ser Pro Ala Val Lys Trp Met Lys Asp Ser Asn Gly Thr Pro Ser 
1315 1320 1325 

Leu Val Thr He Asp Gly Arg Arg Ser He Phe Ser Asn Gly Ser Phe 
1330 1335 1340 

He He Arg Thr Val Lys Ala Glu Asp Ser Gly Tyr Tyr Ser Cys He 
1345 1350 1355 1360 

Ala Asn Asn Asn Trp Gly Ser Asp Glu He He Leu Asn Leu Gin Val 
1365 1370 1375 

Gin Val Pro Pro Asp Gin Pro Arg Leu Thr Val Ser Lys Thr Thr Ser 
1380 1385 1390 

Ser Ser He Thr Leu Ser Trp Leu Pro Gly Asp Asn Gly Gly Ser Ser 
1395 1400 1405 

He Arg Gly Tyr He Leu Gin Tyr Ser Glu Asp Asn Ser Glu Gin Trp 
1410 1415 1420 

Glv Ser Phe Pro He Ser Pro Ser Glu Arg Ser Tyr Arg Leu Glu Asn 
1425 1430 1435 1440 

Leu Lvs Cys Gly Thr Trp Tyr Lys Phe Thr Leu Thr Ala Gin Asn Gly 
1445 1450 1455 

Val Glv Pro Gly Arg He Ser Glu He He Glu Ala Lys Thr Leu Gly 
1460 1465 1470 

Lys Glu Pro Gin Phe Ser Lys Glu Gin Glu Leu Phe Ala Ser He Asn 
1475 1480 1485 

Thr Thr Arg Val Arg Leu Asn Leu He Gly Trp Asn Asp Gly Gly Cys 
1490 1495 1500 

Pro He Thr Ser Phe Thr Leu Glu Tyr Arg Pro Phe Gly Thr Thr Val 
1505 1510 1515 1520 

Trp Thr Thr Ala Gin Arg Thr Ser Leu Ser Lys Ser Tyr He Leu Tyr 
1525 1530 1535 

Asp Leu Gin Glu Ala Thr Trp Tyr Glu Leu Gin Met Arg Val Cys Asn 
1540 1545 1550 

Ser Ala Gly Cys Ala Glu Lys Gin Ala Asn Phe Ala Thr Leu Asn Tyr 
1555 1560 1565 

Asp Glv Ser Thr He Pro Pro Leu He Lys Ser Val Val Gin Asn Glu 
1570 1575 1580 

Glu Glv Leu Thr Thr Asn Glu Gly Leu Lys Met Leu Val Thr He Ser 
1585 1590 1595 1600 

Cvs He Leu Val Gly Val Leu Leu Leu Phe Val Leu Leu Leu Val Val 
1605 1610 1615 
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Arg Arg Arg Arg Arg Glu Gin Arg Leu Lys Arg Leu Arg ^sp Ala Lys 
1620 1625 1630 

Ser Leu Ala Glu Met Leu Met Ser Lys Asn Thr Arg Thr Ser Asp Thr 
1635 1640 1645 

Leu Ser Lys Gin Gin Gin Thr Leu Arg Met His He Asp He Pro Arg 
1650 1655 1660 

Ala Gin Leu Leu He Glu Glu Arg Asp Thr Met Glu Thr He Asp Asp 
1665 1670 1675 1680 

Arcr Ser Thr Val Leu Leu Thr Asp Ala Asp Phe Gly Glu Ala Ala Lys 
1685 1690 1695 

Gin Lys Ser Leu Thr Val Thr His Thr Val His Tyr Gin Ser Val Ser 
1700 1705 1710 

Gin Ala Thr Gly Pro Leu Val Asp Val Ser Asp Ala Arg Pro Gly Thr 
1715 1720 1725 

Asn Pro Thr Thr Arg Arg Asn Ala Lys Ala Gly Pro Thr Ala Arg Asn 
1730 1735 1740 

Arg Tyr Ala Ser Gin Trp Thr Leu Asn Arg Pro His Pro Thr He Ser 
1745 1750 1755 1760 

Ala His Thr Leu Thr Thr Asp Trp Arg Leu Pro Thr Pro Arg Ala Ala 
1765 1770 1775 

Glv Ser Val Asp Lys Glu Ser Asp Ser Tyr Ser Val Ser Pro Ser Gin 
1780 1785 1790 

Asp Thr Asp Arg Ala Arg Ser Ser Met Val Ser Thr Glu Ser Ala Ser 
1795 1800 1805 

Ser Thr Tyr Glu Glu Leu Ala Arg Ala Tyr Glu His Ala Lys Met Glu 
1810 1815 1820 

Glu Gin Leu Arg His Ala Lys Phe Thr He Thr Glu Cys Phe He Ser 
1825 1830 1835 1840 

Asp Thr Ser Ser Glu Gin Leu Thr Ala Gly Thr Asn Glu Tyr Thr Asp 
1845 1850 1855 

Ser Leu Thr Ser Ser Thr Pro Ser Glu Ser Gly He Cys Arg Phe Thr 
I860 1865 1870 

Ala Ser Pro Pro Lys Pro Gin Asp Gly Gly Arg Val Met Asn Met Ala 
1875 1880 1885 

Val Pro Lys Ala He Gly Gin Val Thr Ser Tyr He Cys Leu His Thr 
1890 1895 1900 

Leu Glu Trp Thr Phe Cys 
1905 1910 
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(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 388 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 



CCGGGTATTC 


T TACT CAT GA 


GCATTTCAGC 


TAAACTCTTT 


GCATCTCGCA 


GCCTCTTTAG 


60 


CCTCTGCTCC 


CGCCGCCTCC 


TCCGCACAAC 


CAGCAGGAGC 


ACAAACAGCA 


GCAAGACCCC 


120 


CACCAGGATA 


CAGGAGATGG 


TCACCAGCAT 


CTTGAGCCCC 


TCGTTGGTCG 


TCAGCCCTTC 


180 


TTCGTTTTGG 


ACAACTGACT 


TAATGAGTGG 


AGGAATTGTA 


CTGCCATCGT 


AGTTCAGCGT 


240 


AGCGAAGTTG 


GCCTGCTTCT 


CCGCGCAGCC 


CGCACTGTTG 


CACACCCGCA 


TCTGCAGCTC 


300 


ATACCAGGTG 


GCTTCCTGCA 


GGTCATACAG 


GATGTAGGAC 


TTGGAGAGAG 


AGGTCCTCTG 


360 


AGCTGTGGTC 


CAAACTGTGG 


TCCCAAAG 
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(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
CCTGATGCTC GAGTGAATTC 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
CCAGTTCTCA AAGGAGCAGG 
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(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
CCTGTATGAC CTGCAGGAAG 



(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 842 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



60 
120 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CCGGGCCGGG CGCGGCGGAG CGCAGCGCAA CGCGGGGGGC GAGGCCGGCG CGTGGCTCGC 
TCGCTGGCTC GCTGGCTCGC GGGAGGCCGG GCAGCAGCAG GGGCATGTGG ATACTGGCTC 

TCTCCTTGTT CCAGAGCTTC GCGAATGTTT TCAGTGAAGA GCCCCACTCC AGCCTCTACT 180 

TTGTCAATGC ATCGCTGCAA GAGGTAGTGT TTGCAAGCAC ATCGGGGACG CTGGTGCCCT 24 0 

GCCCGGCTGC AGGCATCCCT CCTGTGACTC TCAGATGGTA CCTAGCAACG GGCGAGGAGA 300 

TCTACGATGT CCCCGGGATC CGCCACGTCC ATCCCAATGG CACTCTCCAA ATTTTCCCCT 3 60 

TTCCTCCTTC AAGCTTCAGC ACCTTAATCC ATGATAATAC TTACTATTGC ACAGCTGAAA 42 0 

ACCCTTCAGG GAAAATTAGA AGTCAGGATG TCCACATCAA GGCTGTTTTA CGGGAGCCCT 4 80 

ATACAGTCCG TGTGGAGGAC CAGAAAACCA TGAGAGGCAA TGTCGCGGTG TTCAAGTGCA 54 0 

TTATCCCCTC CTCGGTGGAG GCGTACGTCT CTGTCGTCTC ATGGGAGAAA GACACGGTTT 600 

CACTTGTCTC AGGATCTAGA TTTCTCATCA CATCCACGGG AGCCTTGTAT ATTAAAGATG 660 

TTCAGAACGA AGATGGGCTG TACAACTACC GCTGCATCGC GCGGCACAGA TTCGCGGGGG 720 

AGACGAGACA GAGCAACTGC GCGAGACTGT TCGTGTCAGA ACCAGCAAAC TCAGCCCATC 780 

CATCCTGGAA GGGTTTGACC ACCGCCAAAC CATGGCCGGG CACGCGTGGA GCTGCCTTGC 84 0 
CA 



842 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 898 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

TGCCGGCCGG TTGCAAGCCT GTACTACAGG CCATACTGCG TGAATTATCA GGTTGTCCAG 60 

GGTGTACACT TCGCTGTCCC GGTGGTGTCA ATACTGATGA TGTTGAACTG GAAGTTACCC 12 0 

CGTGCTGTAC TCCGGTAGCC TATTGGTAGC CGCGAATGAT CCCGTCTTGT ATAGTGTTCT 18 0 

TGGGAGCCTC TCCAGGTAAC CCTGATACTC TGAGATGAGG TGGGTTCCAA GTGAACTTCC 24 0 

TGAGGTGGAC ATCACGAGCT GCCTCATCCG CCGTGATGGT GATCTCGTTG CTGGGCTCAC 300 

TCTTGCCAAT CCGGTTCTTG GCGTACATGC GGATGCTGTA GGTGGAGGAA GGGTGGATAT 3 60 

CAATGATGGT GGCCGAGTTC AGCTGAGGGG AAACATCTTT GGTTCTCTGA GCAGAATCCC 420 

ACGAGTCTGA TTTATTTTTG CATTCACACT GTCATAGCCT GTGATGGGGC TGTTGCCATC 4 80 

AAACCCCATG GTCCACCTGA GCGTGATGGT GCGAGCTTTG ACATCTCTTG ATCTCAATCT 54 0 

CGGGAGGATC TGGGGGTTCT TGCACTGTGA GTTGAATTAT TCCACGGTCC TCCCCGTATG 600 

AATTGATAGC ATGGCAGGAG AAGAAACCGG AATCTTCTCT CACTGTTGGC AAAATCTGCA 660 

GCGTAGATAT CACTTCCTCT CCCACCTCCT TGGTGGATAC AGTACGGGCC ACTTTCAGGG 720 

TTAATGATCC TGTCTCTCTT CTCCAGCGGA CAATGATGGG CTCTCCCATG GGCTGTGCAG 780 

CTCATTCCTT CCTTTGACCC TGATGGCCAG GTGGTGTGGG TATAAGTTAT ATCATGGCCG 840 

GAATTTCCCT GTGAGTCCAT GGACTTGCTG AACGTTCTGC GCCCACATCG TTCGCTGA 8 98 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2173 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

ACCACCATTC ACACACCCAG ACATGGCGGG TTCGCGGCAA CCTTCAGTTC CTGGCCTTCC 60 

TGTAGGGTAA AGGGCTGCTG CGGGTTTATA GACCGGCACA TGCCCATCCT GGCATACGGT 120 

GGCCAGTGGC TTTCCATCTG GATTCCAGGC CAAGCTAAAA ATCTGTTCCT GATGGCCCTG 18 0 
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CAGTTTCAGC 


^--ir^rnrn<^7\/-'/^rn/-^ 

L-G 1 i UALiL^ 1 ^ 


P APTPTf^ AAC^ 


TTCCCAGATG 


CGAACGGTTA 


GATCATAGGA 


240 


ACTGGAAGCC 


AL? i AUA 1 L^^j^cr 




GTGGAAGCGC 


AGAGAGTAGA 


TCTTTTCTGT 


300 


GTGGCCTGTG 


ACjv_-AL-Atj 1 1 


p A p T n T T 


GAGAACATTC 


TCGAGCCAGC 


GAGCGTTCAT 


360 


TL /~i rn rn 7\ 

ACCGCTl GGA 


AAAL- 1 JAJr\£\\D 


TPTHC^G APTT 


GGTATAAGTT 


CACCCTTACT 


GCCCAAAATG 


420 


GAGTAGG 1 GU 




Ac^TC^AAATCA 


TAGAAGCCAA 


AACCCTGGGG 


A?\AGAACCCC 


480 


AGi iGiGCA/l 




rTTTTCGCCA 


GCATCAATAC 


CACCCGAGTG 


AGGCTGAATC 


540 


m/"" 7\ rp rn /-< 1^ /"I nn 

1 GAl 1 GtjjU 1 VJJ 




GGCTGTCCAA 


TCACCTCATT 


CACTCTTGAA 


TACAGACCCT 


600 






ACAGCTCAGC 


GGACCTCCCT 


TTCCAAGTCC 


TAACATTCTG 


660 






GTGGTATGAA 


CTGCAGATGA 


GAGTGTGCAA 


CAGCGCCGGC 


720 


1 G 1 GGC:rLi/\ i 


ACCAAn.CrAA 


CTTCGCCACG 


CTGAACTACG 


ATGGCAGTAC 


AATCCCTCCA 


780 


CTGAi lAAGi 


nACVTCVnC^A 
1 J- 1 ^'wri 


PAAAGCGAAG 


AAGGGCTGAC 


AACCAACGAA 


GGGCTCAAGA 


840 


TCCTGGiGAG 


L-A J- U 1 i 




GGGTTCTACT 


GCTCTTTGTG 


CTTCTGCTGG 


900 


TTGTGCGGAG 


GAGAGGGUkjA 


P A PP AP APPP 


TPAAPAGGCT 


GAGAGATGCA 


AAGAGTTTAG 


960 


CTGAAATGCT 


U A 1 Avd U AHii 


A AP AP APPPA 


CTTCAGATAC 


CTTAAGCAAA 


CAGCAGCAGA 


1020 


CT T T GAGAA i 


(jCAUAi l<jAl 


AT APPPAPPP 


CTCAGCTTTT 


GAT TG AAG AG 


AGAGACACAA 


1080 


T G GAG AG Gii 1 


AoA i (oi^^wo^ 


TPPAPAGTCC 


TGTTGACGGA 


TGCTGACTTC 


GGGGAGGCAG 


1140 


G G AA/i(^ Atj Ail 


PTP2XPTr;APZ\ 


G T GAG T C AC A 


CGGTGCATTA 


CCAATCGGTG 


TCTCAGGCCA 


1200 






TCCGATGCTC 


GGCCAGGAAC 


GAATCCCACC 


ACCAGGAGGA 


1260 


Al GGAAAL^taL, 


TPPZKPPPaPA 


PPGAP7VAACC 


GGTACGCCAG 


CCAGTGGACG 


CTCAACAGAC 


1320 


UL>UAi (^(^ liik^ 


p ATPTPTHPA 


CACACCCTCA 


CCACAGAATG 


AGACTGCTAC 


ACCAGGCTAC 


1380 


AGGAi 




PAPAGTACAG 


CGTCAGCCCA 


TTCACT^AGAC 


ACAGACGAGC 


1440 


AAG AAG U ACir O 


ATPTTPTPPA 


PAPAAAGTGC 


TTCTTCTACC 


TACGAAGACT 


GCCAGGCCTA 


1500 


rn/~"7\ TV 1^ 7\ 1^ ^ O 

TGAAGAGGL-U 


HaPATPPAAP 


APPAPPTPAG 


GCATGCCAAG 


TTCACCATCA 


CAGAGTGCTT 


1560 


CATATCCGAT 


AGG i GG i UOtji 


APPAPTTPAP 


PPP AG GAGAA 


ATGAGTACAC 


GGACAGTCTG 


1620 


AC TO GAG TAG 


/-^ /-I m m TV TV TV 

CCCTT GAGAA 


TT^r^r^paT'PTP 

i UIjVjIjtAI ± o 


PAPATTPATP 


CATCTCCCCC 


CAACCTCAGG 


1680 


ATGGAGGACG 


AGTGTGAAGA 


TT^^^^^r^PT"T'PP 


A A APPPPPAT 


CGGCCAGGCG 


ACTCATACAC 


1740 


CTGCTCCATA 


CG i AGCzfAI CaU 


a'T'T'P'T'T'PTT'A 
/\1 XL^i iol X/^ 


AAPPPPPPGC 


ACCAGGCACC 


AGCAGGACTG 


1800 


AGTTTAGGAC 


AAGCGTGCTT 


GGAACCCCAG 


AAAGTCGGAC 


CCTGAAACGC 


UGOACtjtj i K^\j 


X O D U 


TTGAGCCCAC 


CCCTATGGAG 


GCCTCCTCCT 


CCACTTCTTC 


CACGCGAGAA 


GGACAGCAGT 


1920 


CGTGGCAACA 


AGGGGCTGTG 


GCCACCTTAC 


CTCAGCGAGA 


GGGTGCAGAG 


CTGGACAGGC 


1980 


AGCTAAAATG 


AGCAGCTCCC 


AAGAGTCACT 


GCTGGACTCC 


CGGGCCATTG 


AAAGGAACAA 


2040 


TCCCTACGCA 


AATCTTACAC 


CTTGGTATAA 


CACATGGCAC 


TGATGGACAG 


CGGTTGTAAT 


2100 
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ACAATTAACG AGCCAATCAA GCTACTTTTT TATGAATTCC GATATTTATA ATTAAGAATT 2160 
GCCAAATATA TTA ^^^^ 
(2) INFORMATION FOR SEQ ID NO; 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6413 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 453. .5168 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TGACTGAGGC CGGAGCACGG CAAAGATGAG CCTGCCCGCC CGCCTGCTGC CTGGATGCGG 60 

AGGGTGAGGG CTGGCGCACG GGAGGCCGCT GGCTGCGCAT TCTGGGCGCC GAGTGCCCGG 120 

GATGAGCTCA CGCCCGCGTC TGCGGCTCTC TCCACCTGCC GACCTGCCGG GGGCCCACTG 18 0 

AGCTGACGGC GCACCTGGGC TCCGGCCGCA GCGTGGGGCG CGGCGCCCGG GAGCAGGTGT 24 0 

GCAGGAGCGC AGCGCGCGGC GAGCGCAGCC CTCGCTCCGG AGCCCGGCCG CGCCGCGTGC 300 

CCGGGCGGCT AGGCAGCGGC GGCGGCGGCG GCGGGCGGCG GGCGGGCGGC GGCCCCCGGG 360 

CAGGTGCCGA GCGGCGAGCG GAGCCGGGCC GGGCGGAGCG CGGGGGGCGA GGCCGGCGCG 420 

TCGCTCGCGG GAGGCCGGGG AGCGGCAGGG GC ATG TGG ATA CTG GOT CTC TCC 47 3 

Met Trp lie Leu Ala Leu Ser 
1 5 

TTG TTC CAG AGC TTC GCG AAT GTT TTC ACT GAA GAC CTA CAC TCC AGC 521 
Leu Phe Gin Ser Phe Ala Asn Val Phe Ser Glu Asp Leu His Ser Ser 
10 15 20 

CTC TAC TTT GTC 7VAT GCA TCT CTG CAA GAG GTA GTG TTT GCC AGC ACC 5 69 

Leu Tyr Phe Val Asn Ala Ser Leu Gin Glu Val Val Phe Ala Ser Thr 
25 30 35 

ACG GGG ACT CTG GTG CCC TGC CCC GCA GCA GGC ATC OCT OCT GTG ACT 617 
Thr Gly Thr Leu Val Pro Cys Pro Ala Ala Gly lie Pro Pro Val Thr 
40 45 50 55 

CTC AGA TGG TAC CTA GCC ACG GGC GAG GAG ATC TAC GAT GTC CCC GGG 6 65 

Leu Arq Trp Tyr Leu Ala Thr Gly Glu Glu He Tyr Asp Val Pro Gly 
60 65 70 

ATC CGC CAC GTC CAC CCC AAC GGC ACT CTC CAA ATT TTC CCC TTC OCT 713 
He Arg His Val His Pro Asn Gly Thr Leu Gin He Phe Pro Phe Pro 
75 80 85 

OCT TCA AGC TTC AGT ACC TTA ATC CAT GAT AAT ACT TAT TAT TGC ACA 7 61 

Pro Ser Ser Phe Ser Thr Leu He His Asp Asn Thr Tyr Tyr Cys Thr 
90 95 100 
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GCT GAA A?VT CCT TCA GGG AAA ATT AGA AGT GAG GAT GTC GAG ATC AAG 8 09 

Ala Glu Asn Pro Ser Gly Lys He Arg Ser Gin Asp Val His He Lys 
105 110 115 

GCT GTT TTA CGG GAG CCC TAT ACA GTC CGT GTG GAG GAG CAG AAA ACC 857 
Ala Val Leu Arg Glu Pro Tyr Thr Val Arg Val Glu Asp Gin Lys Thr 
120 125 130 135 

ATG AGA GGC AAT GTT GCG GTC TTC AAG TGC ATT ATC CCC TCC TCG GTG 905 
Met Arq Gly Asn Val Ala Val Phe Lys Cys He He Pro Ser Ser Val 
140 145 150 

GAG GCG TAG ATC ACT GTC GTC TCA TGG GAG AAA GAG ACT GTT TCA GTT 953 
Glu Ala Tyr He Thr Val Val Ser Trp Glu Lys Asp Thr Val Ser Leu 
155 160 165 

GTC TCA GGA TCT AGA TTT GTC ATC ACA TCC ACG GGA GCG TTG TAT ATT 1001 
Val Ser Gly Ser Arg Phe Leu He Thr Ser Thr Gly Ala Leu Tyr He 
170 175 180 

AAA GAT GTA CAG AAT GAA GAT GGA TTG TAT AAG TAG CGG TGC ATC ACG 104 9 

Lys Asp Val Gin Asn Glu Asp Gly Leu Tyr Asn Tyr Arg Cys He Thr 
185 190 195 

CGG CAT GGA TAG ACC GGA GAG ACG AGG CAG AGC AAG AGC GCC AGA GTT 10 97 

Arg His Arg Tyr Thr Gly Glu Thr Arg Gin Ser Asn Ser Ala Arg Leu 
200 205 210 215 

TTT GTA TCA GAG CCA GCG AAC TCA GCC CCA TCC ATA GTG GAT GGG TTT 1145 
Phe Val Ser Asp Pro Ala Asn Ser Ala Pro Ser He Leu Asp Gly Phe 
220 225 230 

GAG CAT CGC ATiA GCC ATG GCT GGG CAG CGT GTG GAG CTG CCT TGC AAA 1193 
Asp His Arg Lys Ala Met Ala Gly Gin Arg Val Glu Leu Pro Cys Lys 
235 240 245 

GCG GTC GGG CAG CCT GAG CCA GAT TAG CGC TGG CTG AAG GAC AAC ATG 1241 
Ala Leu Gly His Pro Glu Pro Asp Tyr Arg Trp Leu Lys Asp Asn Met 
250 255 260 

CCC CTG GAA CTT TCA GGG AGG TTC CAG AAG ACC GTG ACG GGG CTG CTC 128 9 

Pro Leu Glu Leu Ser Gly Arg Phe Gin Lys Thr Val Thr Gly Leu Leu 
265 270 275 

ATT GAG AAC ATT CGC CCC TCG GAC TCA GGC AGC TAT GTT TGT GAA GTG 1337 
He Glu Asn He Arg Pro Ser Asp Ser Gly Ser Tyr Val Cys Glu Val 
280 285 290 295 

TCC AAC AGA TAG GGA ACT GCT AAG GTG ATA GGC CGC CTG TAG GTG AAA 138 5 

Ser Asn Arq Tyr Gly Thr Ala Lys Val He Gly Arg Leu Tyr Val Lys 
300 305 310 

CAG CCA CTG AAA GCC ACC ATC AGT CCC AGG AAG GTT AAA AGC AGC GTG 14 33 

Gin Pro Leu Lys Ala Thr He Ser Pro Arg Lys Val Lys Ser Ser Val 
315 320 325 

GGT AGC GAA GTT TCC TTG TCC TGC AGC GTG ACA GGA ACT GAG GAC CAG 14 81 

Gly Ser Gin Val Ser Leu Ser Cys Ser Val Thr Gly Thr Glu Asp Gin 
330 335 340 

GAA CTC TCC TGG TAG CGC AAT GGT GAA ATC CTC AAC CCT GGA AAA AAT 152 9 

Glu Leu Ser Trp Tyr Arg Asn Gly Glu He Leu Asn Pro Gly Lys Asn 
345 350 355 
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GTG AGG ATC ACA GGG ATC AAC CAC GAA AAC CTT ATA ATG GAT CAC ATG 1577 
Val Arg lie Thr Gly lie Asn His Glu Asn Leu lie Met Asp His Met 
360 365 370 375 

GTG AAA AGT GAG GGG GGC GCA TAG GAG TGC TTT GTG CGC AAG GAG AAG 1625 
Val Lys Ser Asp Gly Gly Ala Tyr Gin Cys Phe Val Arg Lys Asp Lys 
380 385 390 

GTG TCC GCT CAA GAG TAT GTG GAG GTG GTG GTT GAA GAT GGA ACT CCC 1673 
Leu Ser Ala Gin Asp Tyr Val Gin Val Val Leu Glu Asp Gly Thr Pro 
395 400 405 

AAA ATT ATT TCT GGC TTT AGT GAA AAG GTG GTG AGT CCA GCA GAG GGG 1721 
Lvs He He Ser Ala Phe Ser Glu Lys Val Val Ser Pro Ala Glu Pro 
410 415 420 

GTT TCC CTT ATG TGC AAC GTG AAG GGA ACA CCT TTG CCC ACG ATC AGG 17 69 

Val Ser Leu Met Cys Asn Val Lys Gly Thr Pro Leu Pro Thr He Thr 
425 430 435 

TGG ACC CTG GAC GAT GAG CCG ATT CTC AAG GGT GGC AGT CAC CGC ATC 1817 
Trp Thr Leu Asp Asp Asp Pro He Leu Lys Gly Gly Ser His Arg He 
440 445 450 455 

AGG CAG ATG ATC ACG TCG GAG GGG AAC GTG GTC AGG TAG CTG AAC ATC 18 65 

Ser Gin Met He Thr Ser Glu Gly Asn Val Val Ser Tyr Leu Asn He 
460 465 470 

TCC AGC TCC CAG GTC CGG GAC GGG GGA GTC TAG CGC TGC ACT GGG AAC 1913 
Ser Ser Ser Gin Val Arg Asp Gly Gly Val Tyr Arg Cys Thr Ala Asn 
475 480 485 

AAC TCG GCG GGA GTC GTC CTG TAG CAG GCT CGA ATA AAC GTA AGA GGG 1961 
Asn Ser Ala Gly Val Val Leu Tyr Gin Ala Arg He Asn Val Arg Gly 
490 495 500 

CCT GCA AGC ATT CGA CCA ATG AAA AAC ATC ACA GCA ATA GCA GGA CGG 2 009 

Pro Ala Ser He Arg Pro Met Lys Asn He Thr Ala He Ala Gly Arg 
505 510 515 

GAC ACA TAG ATT CAC TGT CGT GTG ATT GGC TAT CCG TAT TAG TCC ATT 2 057 

Asp Thr Tyr He His Cys Arg Val He Gly Tyr Pro Tyr Tyr Ser He 
520 525 530 535 

AAA TGG TAG AAG AAC TCT AAC CTG CTT CCT TTG AAC GAC CGC CAA GTG 2105 
Lys Trp Tyr Lys Asn Ser Asn Leu Leu Pro Phe Asn His Arg Gin Val 
540 545 550 

GCA TTT GAG AAC AAT GGA ACT CTT AAA CTT TCA GAT GTG GAA 7VAG GAA 2153 
Ala Phe Glu Asn Asn Gly Thr Leu Lys Leu Ser Asp Val Gin Lys Glu 
555 560 565 

GTG GAC GAG GGG GAG TAG AGG TGC AAC GTG TTG GTT CAA CCA CAA CTC 2201 
Val Asp Glu Gly Glu Tyr Thr Cys Asn Val Leu Val Gin Pro Gin Leu 
570 575 580 

TCG ACC AGC CAG AGC GTC CAC GTG ACC GTG AAA GTT CCG CCT TTG ATA 22 4 9 

Ser Thr Ser Gin Ser Val His Val Thr Val Lys Val Pro Pro Phe He 
585 590 595 

CAA CCC TTT GAG TTT CCA AGA TTC TCC ATT GGG GAG CGG GTC TTG ATC 2 2 97 

Gin Pro Phe Glu Phe Pro Arg Phe Ser He Gly Gin Arg Val Phe He 
600 605 610 615 
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2681 



CCC TGT GTT GTG GTC TCA GGG GAC TTA CCC ATC AGG ATC ACC TGG GAG 2345 
Pro cys Val Val Val Ser Gly Asp Leu Pro He Thr He Thr Trp Gin 
620 625 630 

AAG GAT GGC CGG CCA ATC CCT GGG AGC CTT GGG GTG ACC ATT GAC AAT 2393 
Lys ASP Gly Arg Pro He Pro Gly Ser Leu Gly Val Thr He Asp Asn 
635 640 645 

ATT GAC TTC ACG AGC TCC TTG AGG ATT TCC AAT CTC TCG CTC ATG CAC 2441 
He Asp Phe Thr Ser Ser Leu Arg He Ser Asn Leu Ser Leu Met Hxs 
650 655 660 

AAT GGG AAT TAG ACC TGC ATA GCC CGG AAT GAG GCC GCC GCT GTG GAG 2489 
Asn Gly Asn Tyr Thr Cys He Ala Arg Asn Glu Ala Ala Ala Val Glu 
665 670 675 

CAC CAA AGC CAG TTG ATT GTC AGA GTT CCT CCC AAG TTT GTG GTT CAG 2537 
His Gin Ser Gin Leu He Val Arg Val Pro Pro Lys Phe Val Val Gin 
680 685 690 695 

CCA CGG GAC CAG GAC GGG ATT TAT GGC AAA GCA GTC ATC CTC AAT TGT 2585 
Pro Arg Asp Gin Asp Gly He Tyr Gly Lys Ala Val He Leu Asn Cys 
700 705 710 

TCT GCT GAG GGT TAG CCT GTA CCT ACC ATC GTG TGG AAA TTC TGT AAA 2 633 

Ser Ala Glu Gly Tyr Pro Val Pro Thr He Val Trp Lys Phe Ser Lys 
715 720 725 

GGT GCT GGG GTT CCC CAG TTC CAG CCA ATT GCC CTA AAT GGC CGA ATC 
Glv Ala Gly Val Pro Gin Phe Gin Pro He Ala Leu Asn Gly Arg He 
730 735 740 

CAA GTT CTC AGC AAT GGG TCG TTG CTG ATC AAG CAT GTC GTG GAG GAA 2729 
Gin Val Leu Ser Asn Gly Ser Leu Leu He Lys His Val Val Glu Glu 
745 750 755 

GAC AGT GGC TAG TAG CTC TGC AAG GTC AGC AAC GAT GTG GGC GCA GAC 2777 
Asp Ser Gly Tyr Tyr Leu Cys Lys Val Ser Asn Asp Val Gly Ala Asp 
760 765 770 775 

GTC AGC AAG TCC ATG TAG CTC ACG GTT AAA ATT CCT GCG ATG ATA AGA 2825 
Val Ser Lys Ser Met Tyr Leu Thr Val Lys He Pro Ala Met He Thr 
780 785 790 

TCC TAT CCA AAT ACT ACC CTG GCC ACG CAG GGG CAG AAA AAG GAG ATG 287 3 

Ser Tyr Pro Asn Thr Thr Leu Ala Thr Gin Gly Gin Lys Lys Glu Met 
795 800 805 

AGC TGC ACG GCG CAT GGT GAG AAG CCC ATT ATA GTC CGG TGG GAG AAG 2921 
Ser Cys Thr Ala His Gly Glu Lys Pro He He Val Arg Trp Glu Lys 
810 815 820 

GAG GAC CGA ATC ATT AAC CCT GAG ATG GCC CGT TAT CTT GTG TCC ACC 2969 
Glu Asp Arg He He Asn Pro Glu Met Ala Arg Tyr Leu Val Ser Thr 
825 830 835 

AAG GAG GTG GGA GAA GAG GTG ATT TCT ACT CTG CAG ATT TTG CCA ACT 3017 
Lvs Glu Val Gly Glu Glu Val He Ser Thr Leu Gin He Leu Pro Thr 
840 845 850 855 

GTG AGA GAA GAT TCT GGT TTC TTT TCC TGC CAT GCT ATT AAT TCT TAT 3065 
Val Arq Glu Asp Ser Gly Phe Phe Ser Cys His Ala He Asn Ser Tyr 
860 865 870 
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GGG GAG GAG CGT GGA ATA ATT CAG CTC ACA GTG CAA GAG CCC CCA GAC 3113 
Glv Glu Asp Arg Gly lie He Gin Leu Thr Val Gin Glu Pro Pro Asp 
875 880 885 

CCT CCC GAA ATT GAG ATC AAA GAT GTC AAA GCA CGC ACA ATT ACG CTC 3161 
Pro Pro Glu He Glu He Lys Asp Val Lys Ala Arg Thr He Thr Leu 
890 895 900 

AGG TGG ACC ATG GGG TTT GAT GGA AAC AGT CCC ATC ACA GGC TAC GAT 320 9 

Arg Trp Thr Met Gly Phe Asp Gly Asn Ser Pro He Thr Gly Tyr Asp 
905 910 915 

ATT GAA TGC AAA AAT AAA TCA GAC TCC TGG GAT TCT GCT CAG AGA ACC 3257 
He Glu Cys Lys Asn Lys Ser Asp Ser Trp Asp Ser Ala Gin Arg Thr 
920 925 930 935 

AAA GAT GTT TCC CCT CAG CTG AAC TCG GCC ACC ATC ATT GAT ATC CAC 3305 
Lvs Asp Val Ser Pro Gin Leu Asn Ser Ala Thr He He Asp He His 
940 945 950 

CCT TCC TCC ACC TAC AGC ATC CGC ATG TAC GCC AAG AAC CGG ATT GGC 3353 
Pro Ser Ser Thr Tyr Ser He Arg Met Tyr Ala Lys Asn Arg He Gly 
955 960 965 

AAG AGC GAG CCC AGC AAC GAG CTC ACC ATC ACG GCG GAC GAG GCA GCT 34 01 

Lvs Ser Glu Pro Ser Asn Glu Leu Thr He Thr Ala Asp Glu Ala Ala 
970 975 980 

CCT GAT GGT CCA CCT CAG GAA GTT CAC CTG GAG CCT ATA TCA TCT CAG 344 9 

Pro Asp Gly Pro Pro Gin Glu Val His Leu Glu Pro He Ser Ser Gin 
985 990 995 

AGC ATC AGG GTC ACA TGG AAG GCT CCC AAG AAA CAT TTG CAA AAT GGG 34 97 

Ser He Arg Val Thr Trp Lys Ala Pro Lys Lys His Leu Gin Asn Gly 
1000 1005 1010 1015 

ATT ATC CGT GGC TAC CAA ATA GGT TAC CGA GAG TAC AGC ACT GGG GGT 354 5 

He He Arg Gly Tyr Gin He Gly Tyr Arg Glu Tyr Ser Thr Gly Gly 
1020 1025 1030 

AAC TTC CAA TTC AAC ATT ATC AGT GTC GAC ACC AGC GGG GAC AGT GAG 35 93 

Asn Phe Gin Phe Asn He He Ser Val Asp Thr Ser Gly Asp Ser Glu 
1035 1040 1045 

GTT TAC ACC CTG GAC AAC CTG AAT AAG TTC ACT CAG TAC GGC CTG GTG 3 641 

Val Tyr Thr Leu Asp Asn Leu Asn Lys Phe Thr Gin Tyr Gly Leu Val 
1050 1055 1060 

GTG CAG GCC TGT AAC CGG GCC GGC ACG GGG CCT TCT TCT CAG GAA ATC 368 9 

Val Gin Ala Cys Asn Arg Ala Gly Thr Gly Pro Ser Ser Gin Glu He 
1065 1070 1075 

ATC ACC ACC ACT CTC GAG GAT GTG CCC AGT TAC CCC CCC GAA AAT GTC 3737 
He Thr Thr Thr Leu Glu Asp Val Pro Ser Tyr Pro Pro Glu Asn Val 
1080 1085 1090 1095 

CAA GCC ATA GCA ACA TCA CCA GAA AGC ATA TCA ATA TCC TGG TCC ACA 37 85 

Gin Ala He Ala Thr Ser Pro Glu Ser He Ser He Ser Trp Ser Thr 
1100 1105 1110 

CTT TCC AAG GAA GCC TTG AAT GGA ATT CTC CAG GGG TTC AGA GTC ATT 3 8 33 

Leu Ser Lys Glu Ala Leu Asn Gly He Leu Gin Gly Phe Arg Val He 
1115 1120 1125 
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TAG 
Tyr 


TGG 
Trp 


GGG J^C 
Ala Asn 
1130 


GTG 
Leu 


ATG 

Met 


GAG 
Asp 


GGA 
Gly 
1135 


GAG 
Glu 


GTG 
Leu 


GGT 
Gly 


GAG 
Glu 


ATT 
He 
114C 


AAA 
Lys 

) 


AAG 
Asn 


ATC 
He 


3881 


ACC 
Thr 


ACG AGA 
Thr Thr 
1145 


GAG 
Gin 


GGT 
Pro 


TCA 
Ser 


GTG GAG 
Leu Glu 
1150 


GTG 
Leu 


GAG 
Asp 


GGG 
Gly 


GTG 
Leu 
1155 


GAA 
Glu 


AAG 
Lys 


TAG 
Tyr 


AGG 
Thr 


3929 


AAG TAG 
Asn Tyr 
1160 


AGG 
Ser 


ATG 
He 


GAG 
Gin 


GTG GTG 
Val Leu 
1165 


GGG 
Ala 


TTG 
Phe 


AGG 
Thr 


GGG GGA 
Arg Ala 
1170 


GGA 
Gly 


GAC 
Asp 


GGG 
Gly 


GTG 
Val 
1175 


3977 


AGO 
Arg 


AGT 
Ser 


GAG 
Glu 


GAG 
Gin 


ATG TTG 
He Phe 
1180 


AGG 
Thr 


GGG 
Arg 


ACG 
Thr 


AAA GAG 
Lys Glu 
1185 


GAT 
Asp 


GTT 
Val 


GGA 
Pro 


GGT GGT 
Gly Pro 
1190 


4025 


CGG 
Pro 


GGG 
Ala 


GGT 
Gly 


GTG AAG 
Val Lys 
1195 


GGA 
Ala 


GGG 
Ala 


GGG 
Ala 


GGG TCA 
Ala Ser 
1200 


GGG 
Ala 


TGG 
Ser 


ATG 
Met 


GTG TTT 
Val Phe 
1205 


GTG 
Val 


4073 


TGG 
Ser 


TGG 
Trp 


GTT GGG 
Leu Pro 
1210 


GGT 
Pro 


GTG 
Leu 


AAG 
Lys 


GTG AAG 
Leu Asn 
1215 


GGG 
Gly 


ATG 
He 


ATG 
He 


GGA AAG 
Arg Lys 
1220 


TAG 
Tyr 


AGT 
Thr 


4121 


GTA 
Val 


TTG TGG 
Phe Gys 
1225 


TGG 
Ser 


GAG 
His 


CGG 
Pro 


TAT GGG 
Tyr Pro 
1230 


AGA 
Thr 


GTG 

Val 


ATC 
He 


AGC GAG 
Ser Glu 
1235 


TTT 
Phe 


GAG 
Glu 


GGG 
Ala 


4169 


TCT CGG 
Ser Pro 
1240 


GAG 
Asp 


TGG 
Ser 


TTT 
Phe 


TGG TAG 
Ser Tyr 
1245 


AGA 
Arg 


ATT 
He 


GGG 
Pro 


AAG GTG 
Asn Leu 
1250 


AGT 
Ser 


AGG 
Arg 


AAT 
Asn 


GGT 
Arg 
1255 


4217 



GAG TAG AGG GTG TGG GTG GTG GGT GTT ACT TCA GGG GGA AGA GGG AAG 4 2 65 

Gin Tyr Ser Val Trp Val Val Ala Val Thr Ser Ala Gly Arg Gly Asn 
1260 1265 1270 

AGG AGT GAA ATG ATC AGA GTG GAG GGA GTA GGA AAA GGT GGT GGA GGA 4 313 

Ser Ser Glu He He Thr Val Glu Pro Leu Ala Lys Ala Pro Ala Arg 
1275 1280 1285 

ATG GTG AGG TTG AGT GGG AGA GTG AGT AGT GGA TGG ATG AAA GAG ATT 4 3 61 

He Leu Thr Phe Ser Gly Thr Val Thr Thr Pro Trp Met Lys Asp He 
1290 1295 1300 

GTG TTG GGT TGT AAG GGT GTT GGG GAC GGT TCT GGT GGA GTG AAA TGG 4 409 

Val Leu Pro Gys Lys Ala Val Gly Asp Pro Ser Pro Ala Val Lys Trp 
1305 1310 1315 

ATG AAA GAG AGT AAG GGG ACA GGG AGT GTA GTA AGG ATT GAT GGG CGG 4 4 57 

Met Lys Asp Ser Asn Gly Thr Pro Ser Leu Val Thr He Asp Gly Arg 
1320 1325 1330 1335 

AGG AGC ATC TTT AGC AAC GGA AGC TTC ATT ATT CGG ACG GTG AAA GGA 4 505 

Arg Ser He Phe Ser Asn Gly Ser Phe He He Arg Thr Val Lys Ala 
1340 1345 1350 

GhA GAG TGG GGG TAT TAG AGC TGG ATT GGG AAT AAC AAG TGG GGA TCT 4 553 

Glu Asp Ser Gly Tyr Tyr Ser Gys He Ala Asn Asn Asn Trp Gly Ser 
1355 1360 1365 

GAT GAA ATT ATT TTA AAG TTA GAA GTA GAA GTT CCA CCA GAT GAG GGT 4 601 

Asp Glu He He Leu Asn Leu Gin Val Gin Val Pro Pro Asp Gin Pro 
1370 1375 1380 
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CGG CTT ACA GTC TCC AAG ACC ACG TCT TCC TCC ATC AGO CTT TOT TGG 4 64 9 

Arg Leu Thr Val Ser Lys Thr Thr Ser Ser Ser He Thr Leu Ser Trp 
1385 1390 1395 

CTC OCT GGA GAG AAC GGG GGC AGO TCT ATC AGA GGA TAG ATA CTG GAG 4 697 

Leu Pro Gly Asp Asn Gly Gly Ser Ser He Arg Gly Tyr He Leu Gin 
1400 1405 1410 1415 

TAG TCC GAG GAG AAT ACT GAG CAG TGG GGG AGT TTT CCA ATC AGC CCC 4745 
Tvr Ser Glu Asp Asn Ser Glu Gin Trp Gly Ser Phe Pro He Ser Pro 
1420 1425 1430 

AGC GAA CGT TCC TAT CGG TTG GAA AAT CTC AAA TGT GGG ACT TGG TAT 4 7 93 

Ser Glu Arg Ser Tyr Arg Leu Glu Asn Leu Lys Cys Gly Thr Trp Tyr 
1435 1440 1445 

AAG TTC ACA CTG ACA GCC GAA AAT GGA GTG GGC CCA GGG CGC ATA AGT 4 841 

Lys Phe Thr Leu Thr Ala Gin Asn Gly Val Gly Pro Gly Arg He Ser 
1450 1455 1460 

GAA ATC ATA GAA GGA AAG ACC TTA GGA AAA GAG CCC CAG TTC TCA AAG 488 9 

Glu He He Glu Ala Lys Thr Leu Gly Lys Glu Pro Gin Phe Ser Lys 
1465 1470 1475 

GAG CAG GAG CTG TTT GCC AGC ATC AAG ACC ACA CGC GTG AGG CTG AAC 4 937 

Glu Gin Glu Leu Phe Ala Ser He Asn Thr Thr Arg Val Arg Leu Asn 
1480 1485 1490 1495 

CTC ATT GGC TGG AAT GAT GGC GGC TGC CCC ATC ACC TCC TTC ACA CTA 4 98 5 

Leu He Gly Trp Asn Asp Gly Gly Cys Pro He Thr Ser Phe Thr Leu 
1500 1505 1510 

GAG TAG AGG CCC TTT GGG ACC ACA GTT TGG ACC ACA GCT CAG AGG ACC 5033 
Glu Tyr Arg Pro Phe Gly Thr Thr Val Trp Thr Thr Ala Gin Arg Thr 
1515 1520 1525 

TCT CTC TCC AAG TCC TAG ATC CTG TAT GAG CTG CAG GAA GCC ACC TGG 5081 
Ser Leu Ser Lys Ser Tyr He Leu Tyr Asp Leu Gin Glu Ala Thr Trp 
1530 1535 1540 

TAT GAG CTG CAG ATG CGG GTG TGC AAC AGT GGG GGC TGC GCG GAG AAG 512 9 

Tyr Glu Leu Gin Met Arg Val Cys Asn Ser Ala Gly Cys Ala Glu Lys 
1545 1550 1555 

CAG GCT AAA GAG GCT GCG AGA TGC AAA GAG TTT AGC TGAAATGCTC 5175 
Gin Ala Lys Glu Ala Ala Arg Cys Lys Glu Phe Ser 
1560 1565 1570 



ATGAGTAAGA 


ATACCCGGAC 


TTCAGATACG 


TTAAGCAAGC 


AACAGCAGAC 


CCTGCGAATG 


5235 


CACATCGACA 


TACCCAGGGC 


TCAGCTTTTG 


ATTGT^GAGA 


GAGACACGAT 


GGAGACCATT 


5295 


GATGATCGCT 


CCACGGTTCT 


GTTGACGGAT 


GCTGACTTTG 


GAGAGGCAGC 


TAAGCAGAAG 


5355 


TCCCTGACGG 


TCACTCACAC 


GGTCCATTAC 


CAATCGGTGT 


CTCAGGCCAC 


TGGGGCCTTA 


5415 


GTGGATGTTT 


CAGACGCTCG 


GCCGGGAACG 


AATCCCACCA 


CGAGGAGGAA 


TGCCAAGGCT 


5475 


GGGCCCACAG 


CGAGAAACCG 


CTATGCCAGC 


CAGTGGACCC 


TCAACCGACC 


CCACCCCACC 


5535 


ATCTCAGCAC 


ACACCCTCAC 


CACAGACTGG 


AGGCTGCCAA 


CACCCAGGGC 


TGCAGGATCA 


5595 



GTAGACAAAG AGAGGGACAG TTAGAGCGTC AGCCCCTCGG AAGACACAGA TCGAGCAAGA 5655 
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AGCAGCATGG TCTCCACAGA AAGTGCCTCC TCCACTTACG AAGAACTGGC CAGGGCCTAC 5715 

GAACACGCCA AGATGGAAGA GCAACTGAGG CACGCCAAGT TCACCATCAC GGAGTGCTTC 5775 

ATATCAGACA CGTCATCGGA GCAGTTGACG GCAGGGACAA ATGAGTACAC GGACAGTCTG 5835 

ACCTCCAGCA CCCCTTCCGA ATCGGGAATC TGCAGGTTCA CTGCATCTCC CCCCAAACCT 5 8 95 

CAGGATGGAG GAAGAGTAAT GAATATGGCA GTTCC7y\AGG CAATCGGCCA GGTGACCTCA 5 955 

TACATTTGCC TCCATACCTT AGAATGGACT TTTTGTTAAA CCGAGGTGGT CCAGGCACCA 6015 

GCAGGGACCT GAGCTTAGGA CAAGCATGCT TGGAACCTCA GAAAAGCCGG ACCCTGAAGC 6075 

GCCCCACGGT CCTGGAGCCC ATCCCGATGG AAGCCGCCTC CTCCGCCTCC TCCACGAGAG 6135 

AAGGACAGTC GTGGCAGCCG GGGGCCGTGG CCACATTACC TCAGCGGGAG GGAGCAGAGC 6195 

TGGGACAGGC AGCTAAAATG AGCAGCTCCC AAGAATCACT GCTCGACTCC CGGGGCCATT 6255 

TGAAAGGAAA CAATCCTTAC GCAAAATCTT ACACCCTGGT ATAACAGACA GCATGACTGG 6315 

ACAGCGGTTG TAAATACAAT TCAAACAATT CAATCAAAGC TACCTTTTTT TTACGGT^TT 637 5 

CCAATATTTA TAATT7VAAGA AAATTGCCAA AATATATT 6413 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1571 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Trp lie Leu Ala Leu Ser Leu Phe Gin Ser Phe Ala Asn Val Phe 
15 10 15 

Ser Glu Asp Leu His Ser Ser Leu Tyr Phe Val Asn Ala Ser Leu Gin 
20 25 30 

Glu Val Val Phe Ala Ser Thr Thr Gly Thr Leu Val Pro Cys Pro Ala 
35 40 45 

Ala Gly lie Pro Pro Val Thr Leu Arg Trp Tyr Leu Ala Thr Gly Glu 
50 55 60 

Glu lie Tyr Asp Val Pro Gly lie Arg His Val His Pro Asn Gly Thr 
65 70 75 80 

Leu Gin lie Phe Pro Phe Pro Pro Ser Ser Phe Ser Thr Leu lie His 
85 90 95 

Asp Asn Thr Tyr Tyr Cys Thr Ala Glu Asn Pro Ser Gly Lys lie Arg 
100 105 110 

Ser Gin Asp Val His He Lys Ala Val Leu Arg Glu Pro Tyr Thr Val 
115 120 125 
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Arg Val Glu Asp Gin Lys Thr Met Arg Gly Asn Val Ala Val Phe Lys 
130 135 140 

Cys He He Pro Ser Ser Val Glu Ala Tyr He Thr Val Val Ser Trp 
145 150 155 160 

Glu Lys Asp Thr Val Ser Leu Val Ser Gly Ser Arg Phe Leu He Thr 
165 170 175 

Ser Thr Gly Ala Leu Tyr He Lys Asp Val Gin Asn Glu Asp Gly Leu 
180 185 190 

Tyr Asn Tyr Arg Cys He Thr Arg His Arg Tyr Thr Gly Glu Thr Arg 
195 200 205 

Gin Ser Asn Ser Ala Arg Leu Phe Val Ser Asp Pro Ala Asn Ser Ala 
210 215 220 

Pro Ser He Leu Asp Gly Phe Asp His Arg Lys Ala Met Ala Gly Gin 
225 230 235 240 

Arg Val Glu Leu Pro Cys Lys Ala Leu Gly His Pro Glu Pro Asp Tyr 
245 250 255 

Ara Trp Leu Lys Asp Asn Met Pro Leu Glu Leu Ser Gly Arg Phe Gin 
260 265 270 

Lys Thr Val Thr Gly Leu Leu He Glu Asn He Arg Pro Ser Asp Ser 
275 280 285 

Gly Ser Tyr Val Cys Glu Val Ser Asn Arg Tyr Gly Thr Ala Lys Val 
290 295 300 

He Gly Arg Leu Tyr Val Lys Gin Pro Leu Lys Ala Thr He Ser Pro 
305 310 315 320 

Ara Lvs Val Lys Ser Ser Val Gly Ser Gin Val Ser Leu Ser Cys Ser 
325 330 335 

Val Thr Gly Thr Glu Asp Gin Glu Leu Ser Trp Tyr Arg Asn Gly Glu 
340 345 350 

He Leu Asn Pro Gly Lys Asn Val Arg He Thr Gly He Asn His Glu 
355 360 365 

Asn Leu He Met Asp His Met Val Lys Ser Asp Gly Gly Ala Tyr Gin 
370 375 380 

Cys Phe Val Arg Lys Asp Lys Leu Ser Ala Gin Asp Tyr. Val Gin Val 
385 390 395 400 

Val Leu Glu Asp Gly Thr Pro Lys He He Ser Ala Phe Ser Glu Lys 
405 410 415 

Val Val Ser Pro Ala Glu Pro Val Ser Leu Met Cys Asn Val Lys Gly 
420 425 430 

Thr Pro Leu Pro Thr He Thr Trp Thr Leu Asp Asp Asp Pro He Leu 
435 440 445 

Lys Gly Gly Ser His Arg He Ser Gin Met He Thr Ser Glu Gly Asn 
450 455 460 



Val Val Ser Tyr 
465 

Val Tyr Arg Cys 



Ala Arg lie Asn 
500 

lie Thr Ala lie 
515 

Gly Tyr Pro Tyr 
530 

Pro Phe Asn His 
545 

Leu Ser Asp Val 



Val Leu Val Gin 
580 

Val Lys Val Pro 
595 

lie Gly Gin Arg 
610 

Pro lie Thr lie 
625 

Leu Gly Val Thr 



Ser Asn Leu Ser 
660 



Asn Glu Ala Ala 
675 

Pro Pro Lys Phe 
690 

Lys Ala Val He 

705 

He Val Trp Lys 



He Ala Leu Asn 
740 

He Lys His Val 
755 

Ser Asn Asp Val 
770 

Lys He Pro Ala 
785 



Leu Asn He Ser 
470 

Thr Ala Asn Asn 
485 

Val Arg Gly Pro 



Ala Gly Arg Asp 
520 



Tyr Ser He Lys 
535 

Arg Gin Val Ala 
550 

Gin Lys Glu Val 
565 

Pro Gin Leu Ser 



Pro Phe He Gin 
600 

Val Phe He Pro 
615 

Thr Trp Gin Lys 
630 

He Asp Asn He 
645 

Leu Met His Asn 



Ala Val Glu His 
680 

Val Val Gin Pro 
695 

Leu Asn Cys Ser 
710 

Phe Ser Lys Gly 
725 

Gly Arg He Gin 



Val Glu Glu Asp 
760 



Gly Ala Asp Val 
775 



Met He Thr Ser 
790 



92 



Ser Ser Gin Val 
475 

Ser Ala Gly Val 
490 

Ala Ser He Arg 
505 

Thr Tyr He His 



Trp Tyr Lys Asn 
540 

Phe Glu Asn Asn 
555 

Asp Glu Gly Glu 

570 

Thr Ser Gin Ser 
585 

Pro Phe Glu Phe 



Cys Val Val Val 
620 

Asp Gly Arg Pro 
635 

Asp Phe Thr Ser 
650 

Gly Asn Tyr Thr 
665 

Gin Ser Gin Leu 



Arg Asp Gin Asp 
700 

Ala Glu Gly Tyr 
715 

Ala Gly Val Pro 
730 

Val Leu Ser Asn 
745 

Ser Gly Tyr Tyr 



Ser Lys Ser Met 
780 

Tyr Pro Asn Thr 
795 



Arg Asp Gly Gly 
480 

Val Leu Tyr Gin 
495 

Pro Met Lys Asn 
510 

Cys Arg Val He 
525 

Ser Asn Leu Leu 



Gly Thr Leu Lys 
560 



Tyr Thr Cys Asn 
575 

Val His Val Thr 
590 

Pro Arg Phe Ser 
605 

Ser Gly Asp Leu 



He Pro Gly Ser 
640 

Ser Leu Arg He 
655 

Cys He Ala Arg 
670 

He Val Arg Val 
685 

Gly He Tyr Gly 



Pro Val Pro Thr 
720 

Gin Phe Gin Pro 
735 

Gly Ser Leu Leu 
750 

Leu Cys Lys Val 
765 

Tyr Leu Thr Val 



Thr Leu Ala Thr 
800 
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Gin Gly Gin Lys Lys Glu Met Ser Cys Thr Ala His Gly Glu Lys Pro 
805 810 815 

lie lie Val Arg Trp Glu Lys Glu Asp Arg lie lie Asn Pro Glu Met 
820 825 830 

Ala Arg Tyr Leu Val Ser Thr Lys Glu Val Gly Glu Glu Val lie Ser 
835 840 845 

Thr Leu Gin lie Leu Pro Thr Val Arg Glu Asp Ser Gly Phe Phe Ser 
850 855 860 

Cys His Ala lie Asn Ser Tyr Gly Glu Asp Arg Gly lie lie Gin Leu 
865 870 875 880 

Thr Val Gin Glu Pro Pro Asp Pro Pro Glu lie Glu lie Lys Asp Val 
885 890 895 

Lys Ala Arg Thr lie Thr Leu Arg Trp Thr Met Gly Phe Asp Gly Asn 
900 905 910 

Ser Pro lie Thr Gly Tyr Asp lie Glu Cys Lys Asn Lys Ser Asp Ser 
915 920 925 

Trp Asp Ser Ala Gin Arg Thr Lys Asp Val Ser Pro Gin Leu Asn Ser 
930 935 940 

Ala Thr lie He Asp He His Pro Ser Ser Thr Tyr Ser He Arg Met 
945 950 955 960 

Tyr Ala Lys Asn Arg He Gly Lys Ser Glu Pro Ser Asn Glu Leu Thr 
965 970 975 

He Thr Ala Asp Glu Ala Ala Pro Asp Gly Pro Pro Gin Glu Val His 
980 985 990 

Leu Glu Pro He Ser Ser Gin Ser He Arg Val Thr Trp Lys Ala Pro 
995 1000 1005 

Lys Lys His Leu Gin Asn Gly He He Arg Gly Tyr Gin He Gly Tyr 
1010 1015 1020 

Arg Glu Tyr Ser Thr Gly Gly Asn Phe Gin Phe Asn He He Ser Val 
1025 1030 1035 1040 

Asp Thr Ser Gly Asp Ser Glu Val Tyr Thr Leu Asp Asn Leu Asn Lys 
1045 1050 1055 

Phe Thr Gin Tyr Gly Leu Val Val Gin Ala Cys Asn Arg Ala Gly Thr 
1060 1065 1070 

Gly Pro Ser Ser Gin Glu He He Thr Thr Thr Leu Glu Asp Val Pro 
1075 1080 1085 

Ser Tyr Pro Pro Glu Asn Val Gin Ala He Ala Thr Ser Pro Glu Ser 
1090 1095 1100 

He Ser He Ser Trp Ser Thr Leu Ser Lys Glu Ala Leu Asn Gly He 
1105 1110 1115 1120 

Leu Gin Gly Phe Arg Val He Tyr Trp Ala Asn Leu Met Asp Gly Glu 
1125 1130 1135 



94 



Leu Gly Glu lie Lys Asn lie Thr Thr Thr Gin Pro Ser Leu Glu Leu 
1140 1145 1150 

Asp Gly Leu Glu Lys Tyr Thr Asn Tyr Ser lie Gin Val Leu Ala Phe 
1155 1160 1165 

Thr Arg Ala Gly Asp Gly Val Arg Ser Glu Gin lie Phe Thr Arg Thr 
1170 1175 1180 

Lys Glu Asp Val Pro Gly Pro Pro Ala Gly Val Lys Ala Ala Ala Ala 
1185 1190 1195 1200 

Ser Ala Ser Met Val Phe Val Ser Trp Leu Pro Pro Leu Lys Leu Asn 
1205 1210 1215 

Gly lie lie Arg Lys Tyr Thr Val Phe Cys Ser His Pro Tyr Pro Thr 
1220 1225 1230 

Val lie Ser Glu Phe Glu Ala Ser Pro Asp Ser Phe Ser Tyr Arg lie 
1235 1240 1245 

Pro Asn Leu Ser Arg Asn Arg Gin Tyr Ser Val Trp Val Val Ala Val 
1250 1255 ' 1260 

Thr Ser Ala Gly Arg Gly Asn Ser Ser Glu lie lie Thr Val Glu Pro 
1265 1270 1275 1280 

Leu Ala Lys Ala Pro Ala Arg lie Leu Thr Phe Ser Gly Thr Val Thr 
1285 1290 1295 

Thr Pro Trp Met Lys Asp lie Val Leu Pro Cys Lys Ala Val Gly Asp 
1300 1305 1310 

Pro Ser Pro Ala Val Lys Trp Met Lys Asp Ser Asn Gly Thr Pro Ser 
1315 1320 1325 

Leu Val Thr lie Asp Gly Arg Arg Ser lie Phe Ser Asn Gly Ser Phe 
1330 1335 1340 

lie lie Arg Thr Val Lys Ala Glu Asp Ser Gly Tyr Tyr Ser Cys lie 
1345 1350 1355 1360 

Ala Asn Asn Asn Trp Gly Ser Asp Glu lie lie Leu Asn Leu Gin Val 
1365 1370 1375 

Gin Val Pro Pro Asp Gin Pro Arg Leu Thr Val Ser Lys Thr Thr Ser 
1380 1385 1390 

Ser Ser lie Thr Leu Ser Trp Leu Pro Gly Asp Asn Gly Gly Ser Ser 
1395 1400 1405 

lie Arg Gly Tyr lie Leu Gin Tyr Ser Glu Asp Asn Ser Glu Gin Trp 
1410 1415 1420 

Gly Ser Phe Pro lie Ser Pro Ser Glu Arg Ser Tyr Arg Leu Glu Asn 
1425 1430 1435 1440 

Leu Lys Cys Gly Thr Trp Tyr Lys Phe Thr Leu Thr Ala Gin Asn Gly 
1445 1450 1455 

Val Gly Pro Gly Arg lie Ser Glu lie lie Glu Ala Lys Thr Leu Gly 
1460 1465 1470 
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Lys Glu Pro Gin Phe Ser Lys Glu Gin Glu Leu Phe Ala Ser lie Asn 
1475 1480 1485 

Thr Thr Arg Val Arg Leu Asn Leu lie Gly Trp Asn Asp Gly Gly Cys 
1490 1495 1500 

Pro lie Thr Ser Phe Thr Leu Glu Tyr Arg Pro Phe Gly Thr Thr Val 
1505 1510 1515 1520 

Trp Thr Thr Ala Gin Arg Thr Ser Leu Ser Lys Ser Tyr lie Leu Tyr 
1525 1530 1535 

Asp Leu Gin Glu Ala Thr Trp Tyr Glu Leu Gin Met Arg Val Cys Asn 
1540 1545 1550 

Ser Ala Gly Cys Ala Glu Lys Gin Ala Lys Glu Ala Ala Arg Cys Lys 
1555 1560 1565 



Glu Phe Ser 
1570 
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That which is claimed is: 

1, Isolated nucleic acid encoding a mammalian DS- 

CAM member of the Immunoglobin (Ig) superfamily of 
proteins, or a fragment thereof, wherein said DS-CAM 
5 comprises at least 7 Ig-like domains. 

2. Isolated nucleic acid according to claim 1, 
wherein said nucleic acid, or fragments thereof, is 
selected from : 

10 (a) DNA encoding the amino acid sequence set forth 

in SEQ ID NO : 2 or SEQ ID NO : 11 , or the DS-CAM coding 
region of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO: 9, 

(b) DNA that hybridizes to the DNA of (a) under 
moderately stringent conditions, wherein said DNA 

15 encodes biologically active DS-CAM, or 

(c) DNA degenerate with respect to either (a) or 
(b) above, wherein said DNA encodes biologically active 
DS-CAM. 

3. A nucleic acid according to claim 2, wherein 
20 said nucleic acid hybridizes under high stringency 

conditions to the DS-CAM coding portion of nucleotides 
SEQ ID N0:1, SEQ ID NO : 7 , SEQ ID NO : 8 , SEQ ID NO : 9 or 
SEQ ID NO: 10, 

4. A nucleic acid according to claim 2, wherein 
25 the nucleotide sequence of said nucleic acid is 

substantially the same as that set forth in SEQ ID N0:1, 
SEQ ID NO: 7, SEQ ID NO : 8 , SEQ ID NO : 9 or SEQ ID NO: 10, 

5. A nucleic acid according to claim 2, wherein 
the nucleotide sequence of said nucleic acid is the same 

3 0 as that set forth in SEQ ID NO:l, SEQ ID NO : 7 , 
SEQ ID NO: 8, SEQ ID NO : 9 or SEQ ID NO: 10. 
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6. A nucleic acid according to claim 2, wherein 
said nucleic acid is cDNA. 

7. A vector containing the nucleic acid of claim 

2 . 

5 8. Recombinant cells containing the nucleic acid 

of claim 2 , 

9. An oligonucleotide comprising at least 15 
nucleotides capable of specifically hybridizing with a 
sequence of nucleic acids of the nucleotide sequence set 

10 forth in SEQ ID N0:1, SEQ ID NO : 7 , SEQ ID NO : 8 , 
SEQ ID NO: 9 or SEQ ID NO: 10. 

10. An oligonucleotide according to claim 9, 
wherein said oligonucleotide is labeled with a detectable 
marker. 

15 11. An antisense oligonucleotide capable of 

specifically binding to mRNA encoded by said nucleic acid 
according to claim 2 . 

12. A kit for detecting the presence of the DS-CAM 
cDNA sequence comprising at least one oligonucleotide 

20 according to claim 10. 

13. An isolated DS-CAM protein comprising at least 
7 Ig-like domains. 

14. A DS-CAM protein according to claim 13, further 
characterized by being expressed in a significantly 

2 5 higher amount in brain versus lung, liver or kidney. 

15. A DS-CAM protein according to claim 13, wherein 
the amino acid sequence of said protein comprises 
substantially the same protein sequence set forth in 
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SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM coding region 
of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO : 9 . 

16. A DS-CAM protein according to claim 15 
comprising the same amino acid sequence as the protein 

5 sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the 
DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO: 8 or 
SEQ ID NO: 9 . 

17. A DS-CAM protein according to claim 13, wherein 
said protein is encoded by a nucleotide sequence 

10 comprising substantially the same nucleotide sequence set 
forth in SEQ ID N0:1, SEQ ID NO : 7 , SEQ ID NO : 8 , 
SEQ ID NO: 9, or SEQ ID NO: 10. 

18. A DS-CAM protein according to claim 17, wherein 
said protein is encoded by a nucleotide sequence 

15 comprising SEQ ID N0:1 or SEQ ID NO: 10. 

19. A DS-CAM protein according to claim 13, wherein 
said protein is encoded by a nucleotide sequence that 
comprises substantially the same nucleotide sequence as 
nucleotides 453-6185 set forth in SEQ ID N0:1, 

20 nucleotides 453-5168 set forth in SEQ ID NO: 10, 
SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO : 9 . 

20. Method for expression of a DS-CAM-related 
protein, said method comprising culturing cells of claim 
8 under conditions suitable for expression of said DS-CAM 

2 5 protein. 

21. An isolated anti-DS-CAM antibody having 
specific reactivity with a DS-CAM protein according to 
claim 13 . 

22. Antibody according to claim 21, wherein said 

3 0 antibody is a monoclonal antibody. 
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23. An antibody according to claim 21, wherein said 
antibody is a polyclonal antibody. 

24. A composition comprising an amount of the 
antisense oligonucleotide according to claim 11 effective 

5 to inhibit expression of a DS-CAM protein and an 

acceptable hydrophobic carrier capable of passing through 
a cell membrane. 

25. A transgenic nonhuman mammal expressing 
exogenous nucleic acid encoding a DS-CAM protein. 

10 26. A transgenic nonhuman mammal according to claim 

25, wherein said nucleic acid encoding said DS-CAM 
protein has been mutated, and wherein the DS-CAM protein 
so expressed is not native DS-CAM. 

27. A transgenic nonhuman mammal according to claim 
15 25, wherein the transgenic nonhuman mammal is a mouse. 

28. A method for identifying nucleic acids encoding 
a mammalian DS-CAM protein, said method comprising: 

contacting a sample containing nucleic acids with an 
oligonucleotide according to claim 9, wherein said 

2 0 contacting is effected under high stringency 

hybridization conditions, and identifying compounds which 
hybridize thereto . 

29. A method for detecting the presence of a 
mammalian DS-CAM protein in a sample, said method 

25 comprising contacting a test sample with an antibody 
according to claim 21, detecting the presence of an 
antibody- DS-CAM complex, and therefor detecting the 
presence of a mammalian DS-CAM in said test sample. 

30. Single strand DNA primers for amplification of 

3 0 DS-CAM nucleic acid, wherein said primers comprise a 
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nucleic acid sequence derived from the nucleic acid 
sequence set forth as SEQ ID NO: 1, SEQ ID NO: 10, 
SEQ ID NO : 7 , SEQ ID NO : 8 or SEQ ID NO : 9 . 
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ABSTRACT OF THE DISCLOSURE 

In accordance with the present invention, there 
are provided novel Down Syndrome -Cell Adhesion Molecule 
(DS-CAM) proteins. Nucleic acid sequences encoding such 
5 proteins and assays employing same are also disclosed. 
The invention DS-CAM proteins can be employed in a 
variety of ways, for example, for the production of 
anti-DS-CAM antibodies thereto, in therapeutic 
compositions and methods employing such proteins and/or 
10 antibodies. DS-CAM proteins are also useful in bioassays 
to identify agonists and antagonists thereto. 
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SEQ ID N0:2 

1 MMILALSLFSSFAKVFSEDLKSS 

24 WrsmASI^EVVFMTTGTLVPCPJWUSIPPVTtilWYIATGEEI^ 
STLIHIlNTrfCXAainGKXRSQDVHZKAVLREFY 

127 TVRVEDQKTHRQlVAVFI^IlPSSVEAriTVVSSfEKDTTC 
TfRClTRHRYTGETRQSHSARLlVSDPAHSAP 

226 SILDGFDHWCAMAGQ«VEtPC!OU£HP£PDYRWIiCD^ 
EVSNRYGTAKVIGRLlfVXQPLKX 

317 TISPRKVKSSVGSQfVSI^CSVTGrEDQEXSMYWlGEIUlPGKI^ 
firVRKDKLSAQOYVQfVVtEDGTPKI 

410 ISAFSEKVVSPAZPVSUCHVKCTPLPTITWTII)DDPILKGGSHRISOMITSEGNV^ 
DGGVYRCTMiaAGVVLYQARINVRGPAS 

507 IRPMMJITAIAGRDTYlHCKVIGYPYYSIKHYKNSNIJ^FtniRQVAFEaraGTIJa^D^ 
OWLVQpQLSTSQSVHVTVKVPPnQPFE 

604 FPRFSIGQKVriPCVVVSCTI^ITITHQKDGRPIPGSLGVTIDHIDFTSSIJlISHLSUlHNCaiTl^ 
RHEAAAVEHQSQLlVKVPPKrVVOPR 

69B t)aiX5iyGKAVIUlQSAEGrPVPTrVWKrSKGAGVTQFQPIAIJJGRIQYI.SlWSI^^^ 
CKVSNDVCaDVSKSMiaTVKlPAMITS 

793 '*P«WlATQ5QKKEMSCTAHGEKPIIVRWEKEDRIINPEKARYLVSTKirtfX^^ 
GFrSCHAINSYGEDRGIIQLTVQEPPI) 

888 PPEIEIKDVlCRRTTTIJlWT«GrDGKSPITGYDI£CX»KSDSWDSAQRTKDVSPQI^SATIlDIHPSST 
YSIRKYAKNRIGKSEPSNELTITADEAA 

984 PIX»PQEVRI£PISSOSlKVTWKAPK!a!I«HGIIRGYQIGYREYSTGGHFOraiISVDTSGDSE^ 
DHLKKPTQYGLVVQACNRACTGPSSQEIITTTLED 

1087 VPSYPPENVQAlATSPESISISWSTI^maNGIUIGFR\rtYWAMLMDGELGEIKKIT^ 
t£KtTKraiOVIAFTRAGDGVRSEQirrRTK 

1186 EDVTGPPAGVKAAAASASMVrireWLPPIJaHGIIRKYTVTCSHPyPTVTSEF^ 
NRQYSVWWAVTSAGRGKSSEIITVEPL 

1282 AKMARILTrSGTVTTPSWrailVLPCKAVGDPSPAVKWMKDSWmSLVTIDGRM 
KAE0S6YYSSXA»NNHGSD£IIUILQ 

1378 VQVPPOQPRLTre!CrTSSSITISSn.PGDSGGSSIRGYIMYSEDNSE0WKrPlSPSERSYRI^^ 
GnrtKm.TAQNGVCPGRlSElIEAKTL 

1472 GKEPOrSKEOELERS IHI I RV I R IlMCMHOGGCPITSrrLEYRP tlii IVM 1 lA ORTSLSKSYILYDLO 
EXnrrEIOlKVC H SAGCREXOWrATUfyDGSTIPPLIKSVVQNEEG 

1S95 MLVTISCXLVGVLLLIVLLLW 

1617 RRRRREQRUCRlADMCSIAEHLMSKHTRTSirri^KQQOTlilHHIDIPIUW^ 
UJOftPrGERAK QKSLT mrrVHYQSTOOATGPLVDTODftRP bi 

KRPHPTlSAHTLTTDWRLPTPRJUUaVDKESDSYSVSPSQiyTDRARSSHVWESASSTrEEU^ 

A»«EE0UUlAKFTITECriSDTSSEOLTAGTNETTDSLTSSTPSESCICRrr3^PPKPQDGGRVM»A 

VPKMOQVTSYlCLHTLESrrrC 
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Figure 4 




